details of the close kind

Important Papers on LLMs and GPTs

Large language models, or LLMs, have varying levels of training and parameters. LLMs contain hundreds of millions or billions of documents and words that have been said over time. Few new business and social ideas are ever discovered. For decades, the words to describe any task have been uttered and captured. Mature LLMs (none exist in 2023) will provide trusted information. Encyclopedia Britannica was a trusted source in the 1960s and 1970s. A number of competing encyclopedias were sold, as a number of key LLMs will emerge.

It’s kind of like the discussions on oversize rings in an engine or memory speed in a computer. Ford vs Chevy. Bank of America vs Chase. The difference was rarely seen in a meaningful way.

These are titles and links to seminal papers on underlying AI research.

LLaMA: Open and Efficient Foundation Language Modelshttps://research.facebook.com/publications/llama-open-and-efficient-foundation-language-models/
Semantic reconstruction of continuous language from non-invasive brain recordingshttps://www.nature.com/articles/s41593-023-01304-9.epdf
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generationhttps://arxiv.org/abs/2305.01210
Unlimiformer: Long-Range Transformers with Unlimited Length Inputhttps://arxiv.org/abs/2305.01625
Distill or Annotate? Cost-Efficient Fine-Tuning of Compact Modelshttps://arxiv.org/abs/2305.01645
Language Models: GPT and GPT-2https://cameronrwolfe.substack.com/p/language-models-gpt-and-gpt-2
Transformer Puzzleshttps://github.com/srush/Transformer-Puzzles
LlamaIndex 0.6.0: A New Query Interface Over your Datahttps://betterprogramming.pub/llamaindex-0-6-0-a-new-query-interface-over-your-data-331996d47e89
The Ultimate Battle of Language Models: Lit-LLaMA vs GPT3.5 vs Bloom vs …https://lightning.ai/pages/community/community-discussions/the-ultimate-battle-of-language-models-lit-llama-vs-gpt3.5-vs-bloom-vs/
Harnessing LLMshttps://www.linkedin.com/pulse/harnessing-llms-part-i-peter-bull/
How to train your own Large Language Modelshttps://blog.replit.com/llm-training
Scaling Forward Gradient With Local Losseshttps://arxiv.org/abs/2210.03310
Introducing Lamini, the LLM Engine for Rapidly Customizing Modelshttps://lamini.ai/blog/introducing-lamini
Categorification of Group Equivariant Neural Networkshttps://arxiv.org/pdf/2304.14144v1.pdf
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyondhttps://arxiv.org/abs/2304.13712
The Practical Guides for Large Language Modelshttps://github.com/Mooler0410/LLMsPracticalGuide
Introduction to LangChain: A Framework for LLM Powered Applicationshttps://www.davidgentile.net/introduction-to-langchain/
Multi-Party Chat: Conversational Agents in Group Settings with Humans and Modelshttps://arxiv.org/abs/2304.13835
A large-scale comparison of human-written versus ChatGPT-generated essayshttps://t.co/qLO7JV2Gbl
Evaluation of GPT-3.5 and GPT-4 for supporting real-world information needs in healthcare deliveryhttps://arxiv.org/abs/2304.13714
A Cookbook of Self-Supervised Learninghttps://arxiv.org/abs/2304.12210
NeMo Guardrailshttps://developer.nvidia.com/blog/nvidia-enables-trustworthy-safe-and-secure-large-language-model-conversational-systems/
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Headhttps://arxiv.org/abs/2304.12995
State Spaces Aren’t Enough: Machine Translation Needs Attentionhttps://arxiv.org/abs/2304.12776
Answering Questions by Meta-Reasoning over Multiple Chains of Thoughthttps://arxiv.org/abs/2304.13007
Getting Started with LangChain: A Beginner’s Guide to Building LLM-Powered Applicationshttps://towardsdatascience.com/getting-started-with-langchain-a-beginners-guide-to-building-llm-powered-applications-95fc8898732c
Generative AI at Workhttps://www.nber.org/papers/w31161
LLM+P: Empowering Large Language Models with Optimal Planning Proficiencyhttps://arxiv.org/abs/2304.11477
Language Modeling using LMUs: 10x Better Data Efficiency or Improved Scaling Compared to Transformershttps://arxiv.org/abs/2110.02402
Improving Document Retrieval with Contextual Compressionhttps://blog.langchain.dev/improving-document-retrieval-with-contextual-compression/
The Embedding Archives: Millions of Wikipedia Article Embeddings in Many Languageshttps://txt.cohere.com/embedding-archives-wikipedia/
Hugging Face Hubhttps://python.langchain.com/en/latest/modules/models/llms/integrations/huggingface_hub.html
Effective Instruction Tuninghttps://twitter.com/vagabondjack/status/1649127428659265537
Reinforcement Learning with Human Feedback (RLHF)https://github.com/opendilab/awesome-RLHF
Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakeshttps://arxiv.org/abs/2304.09433
Transformer Math 101https://blog.eleuther.ai/transformer-math/
Open-source research on large language models (LLMs)https://twitter.com/cwolferesearch/status/1647990311547797504
A visual guide to transformershttps://twitter.com/akshay_pachaar/status/1647940492712345601
Enhancing Vision-language Understanding with Advanced Large Language Modelshttps://github.com/Vision-CAIR/MiniGPT-4/blob/main/MiniGPT_4.pdf
Transformer: Attention Is All You Needhttps://arxiv.org/abs/1706.03762
LLMs on personal deviceshttps://simonwillison.net/series/llms-on-personal-devices/
LLM Source Context Evaluationhttps://twitter.com/jerryjliu0/status/1647626532519841793
Generative Agents: Interactive Simulacra of Human Behaviorhttps://arxiv.org/abs/2304.03442
Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Studyhttps://arxiv.org/abs/2304.06762
Auto-evaluate LLM Q+A chainshttps://twitter.com/RLanceMartin/status/1647645549875859456
Understanding Diffusion Models: A Unified Perspectivehttps://arxiv.org/abs/2208.11970
Building LLM applications for productionhttps://huyenchip.com/2023/04/11/llm-engineering.html
Boosted Prompt Ensembles for Large Language Modelshttps://arxiv.org/abs/2304.05970
Teaching Large Language Models to Self-Debughttps://arxiv.org/abs/2304.05128
The Power of Scale for Parameter-Efficient Prompt Tuninghttps://arxiv.org/abs/2104.08691
Multimodal Procedural Planning via Dual Text-Image Promptinghttps://arxiv.org/abs/2305.01795
Are Emergent Abilities of Large Language Models a Mirage?https://arxiv.org/abs/2304.15004
An evolutionary tree of modern Large Language Models (LLMs) like ChatGPT: BERT-style Language Models: Encoder-Decoder or Encoder-only
BERT BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
RoBERTa ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
DistilBERT DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
ALBERT ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
ELECTRA ELECTRA: PRE-TRAINING TEXT ENCODERS AS DISCRIMINATORS RATHER THAN GENERATORS
T5 “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer”
GLM “GLM-130B: An Open Bilingual Pre-trained Model”
AlexaTM “AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model”

GPT-style Language Models: Decoder-only:
GPT-3 “Language Models are Few-Shot Learners”
NeurIPS OPT “OPT: Open Pre-trained Transformer Language Models”
PaLM “PaLM: Scaling Language Modeling with Pathways”
BLOOM “BLOOM: A 176B-Parameter Open-Access Multilingual Language Model”
MT-NLG “Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model”
GLaM “GLaM: Efficient Scaling of Language Models with Mixture-of-Experts”
Gopher “Scaling Language Models: Methods, Analysis & Insights from Training Gopher”
chinchilla “Training Compute-Optimal Large Language Models
LaMDA “LaMDA: Language Models for Dialog Applications”
LLaMA “LLaMA: Open and Efficient Foundation Language Models”
GPT-4 “GPT-4 Technical Report”
BloombergGPT BloombergGPT: A Large Language Model for Finance,
GPT-NeoX-20B: “GPT-NeoX-20B: An Open-Source Autoregressive Language Model”
Search for papers by name. Thanks to @AlexAIDaily