details of the close kind

Important Papers on LLMs and GPTs

Large language models, or LLMs, have varying levels of training and parameters. LLMs contain hundreds of millions or billions of documents and words that have been said over time. Few new business and social ideas are ever discovered. For decades, the words to describe any task have been uttered and captured. Mature LLMs (none exist in 2023) will provide trusted information. Encyclopedia Britannica was a trusted source in the 1960s and 1970s. A number of competing encyclopedias were sold, as a number of key LLMs will emerge.

It’s kind of like the discussions on oversize rings in an engine or memory speed in a computer. Ford vs Chevy. Bank of America vs Chase. The difference was rarely seen in a meaningful way.

These are titles and links to seminal papers on underlying AI research.

LLaMA: Open and Efficient Foundation Language Models
Semantic reconstruction of continuous language from non-invasive brain recordings
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
Unlimiformer: Long-Range Transformers with Unlimited Length Input
Distill or Annotate? Cost-Efficient Fine-Tuning of Compact Models
Language Models: GPT and GPT-2
Transformer Puzzles
LlamaIndex 0.6.0: A New Query Interface Over your Data
The Ultimate Battle of Language Models: Lit-LLaMA vs GPT3.5 vs Bloom vs …
Harnessing LLMs
How to train your own Large Language Models
Scaling Forward Gradient With Local Losses
Introducing Lamini, the LLM Engine for Rapidly Customizing Models
Categorification of Group Equivariant Neural Networks
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
The Practical Guides for Large Language Models
Introduction to LangChain: A Framework for LLM Powered Applications
Multi-Party Chat: Conversational Agents in Group Settings with Humans and Models
A large-scale comparison of human-written versus ChatGPT-generated essays
Evaluation of GPT-3.5 and GPT-4 for supporting real-world information needs in healthcare delivery
A Cookbook of Self-Supervised Learning
NeMo Guardrails
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
State Spaces Aren’t Enough: Machine Translation Needs Attention
Answering Questions by Meta-Reasoning over Multiple Chains of Thought
Getting Started with LangChain: A Beginner’s Guide to Building LLM-Powered Applications
Generative AI at Work
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency
Language Modeling using LMUs: 10x Better Data Efficiency or Improved Scaling Compared to Transformers
Improving Document Retrieval with Contextual Compression
The Embedding Archives: Millions of Wikipedia Article Embeddings in Many Languages
Hugging Face Hub
Effective Instruction Tuning
Reinforcement Learning with Human Feedback (RLHF)
Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes
Transformer Math 101
Open-source research on large language models (LLMs)
A visual guide to transformers
Enhancing Vision-language Understanding with Advanced Large Language Models
Transformer: Attention Is All You Need
LLMs on personal devices
LLM Source Context Evaluation
Generative Agents: Interactive Simulacra of Human Behavior
Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study
Auto-evaluate LLM Q+A chains
Understanding Diffusion Models: A Unified Perspective
Building LLM applications for production
Boosted Prompt Ensembles for Large Language Models
Teaching Large Language Models to Self-Debug
The Power of Scale for Parameter-Efficient Prompt Tuning
Multimodal Procedural Planning via Dual Text-Image Prompting
Are Emergent Abilities of Large Language Models a Mirage?
An evolutionary tree of modern Large Language Models (LLMs) like ChatGPT: BERT-style Language Models: Encoder-Decoder or Encoder-only
BERT BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
RoBERTa ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
DistilBERT DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
ALBERT ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
T5 “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer”
GLM “GLM-130B: An Open Bilingual Pre-trained Model”
AlexaTM “AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model”

GPT-style Language Models: Decoder-only:
GPT-3 “Language Models are Few-Shot Learners”
NeurIPS OPT “OPT: Open Pre-trained Transformer Language Models”
PaLM “PaLM: Scaling Language Modeling with Pathways”
BLOOM “BLOOM: A 176B-Parameter Open-Access Multilingual Language Model”
MT-NLG “Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model”
GLaM “GLaM: Efficient Scaling of Language Models with Mixture-of-Experts”
Gopher “Scaling Language Models: Methods, Analysis & Insights from Training Gopher”
chinchilla “Training Compute-Optimal Large Language Models
LaMDA “LaMDA: Language Models for Dialog Applications”
LLaMA “LLaMA: Open and Efficient Foundation Language Models”
GPT-4 “GPT-4 Technical Report”
BloombergGPT BloombergGPT: A Large Language Model for Finance,
GPT-NeoX-20B: “GPT-NeoX-20B: An Open-Source Autoregressive Language Model”
Search for papers by name. Thanks to @AlexAIDaily