Important Papers on LLMs and GPTs
Large language models, or LLMs, have varying levels of training and parameters. LLMs contain hundreds of millions or billions of documents and words that have been said over time. Few new business and social ideas are ever discovered. For decades, the words to describe any task have been uttered and captured. Mature LLMs (none exist in 2023) will provide trusted information. Encyclopedia Britannica was a trusted source in the 1960s and 1970s. A number of competing encyclopedias were sold, as a number of key LLMs will emerge.
It’s kind of like the discussions on oversize rings in an engine or memory speed in a computer. Ford vs Chevy. Bank of America vs Chase. The difference was rarely seen in a meaningful way.
These are titles and links to seminal papers on underlying AI research.
Category | Title | Source | ||
---|---|---|---|---|
Research | LLaMA: Open and Efficient Foundation Language Models | https://research.facebook.com/publications/llama-open-and-efficient-foundation-language-models/ | ||
Research | Semantic reconstruction of continuous language from non-invasive brain recordings | https://www.nature.com/articles/s41593-023-01304-9.epdf | ||
Research | Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation | https://arxiv.org/abs/2305.01210 | ||
Research | Unlimiformer: Long-Range Transformers with Unlimited Length Input | https://arxiv.org/abs/2305.01625 | ||
Research | Distill or Annotate? Cost-Efficient Fine-Tuning of Compact Models | https://arxiv.org/abs/2305.01645 | ||
Research | Language Models: GPT and GPT-2 | https://cameronrwolfe.substack.com/p/language-models-gpt-and-gpt-2 | ||
Research | Transformer Puzzles | https://github.com/srush/Transformer-Puzzles | ||
Research | LlamaIndex 0.6.0: A New Query Interface Over your Data | https://betterprogramming.pub/llamaindex-0-6-0-a-new-query-interface-over-your-data-331996d47e89 | ||
Research | The Ultimate Battle of Language Models: Lit-LLaMA vs GPT3.5 vs Bloom vs … | https://lightning.ai/pages/community/community-discussions/the-ultimate-battle-of-language-models-lit-llama-vs-gpt3.5-vs-bloom-vs/ | ||
Research | Harnessing LLMs | https://www.linkedin.com/pulse/harnessing-llms-part-i-peter-bull/ | ||
Research | How to train your own Large Language Models | https://blog.replit.com/llm-training | ||
Research | Scaling Forward Gradient With Local Losses | https://arxiv.org/abs/2210.03310 | ||
Research | Introducing Lamini, the LLM Engine for Rapidly Customizing Models | https://lamini.ai/blog/introducing-lamini | ||
Research | Categorification of Group Equivariant Neural Networks | https://arxiv.org/pdf/2304.14144v1.pdf | ||
Research | Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond | https://arxiv.org/abs/2304.13712 | ||
Research | The Practical Guides for Large Language Models | https://github.com/Mooler0410/LLMsPracticalGuide | ||
Research | Introduction to LangChain: A Framework for LLM Powered Applications | https://www.davidgentile.net/introduction-to-langchain/ | ||
Research | Multi-Party Chat: Conversational Agents in Group Settings with Humans and Models | https://arxiv.org/abs/2304.13835 | ||
Research | A large-scale comparison of human-written versus ChatGPT-generated essays | https://t.co/qLO7JV2Gbl | ||
Research | Evaluation of GPT-3.5 and GPT-4 for supporting real-world information needs in healthcare delivery | https://arxiv.org/abs/2304.13714 | ||
Research | A Cookbook of Self-Supervised Learning | https://arxiv.org/abs/2304.12210 | ||
Research | NeMo Guardrails | https://developer.nvidia.com/blog/nvidia-enables-trustworthy-safe-and-secure-large-language-model-conversational-systems/ | ||
Research | AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head | https://arxiv.org/abs/2304.12995 | ||
Research | State Spaces Aren’t Enough: Machine Translation Needs Attention | https://arxiv.org/abs/2304.12776 | ||
Research | Answering Questions by Meta-Reasoning over Multiple Chains of Thought | https://arxiv.org/abs/2304.13007 | ||
Research | Getting Started with LangChain: A Beginner’s Guide to Building LLM-Powered Applications | https://towardsdatascience.com/getting-started-with-langchain-a-beginners-guide-to-building-llm-powered-applications-95fc8898732c | ||
Research | Generative AI at Work | https://www.nber.org/papers/w31161 | ||
Research | LLM+P: Empowering Large Language Models with Optimal Planning Proficiency | https://arxiv.org/abs/2304.11477 | ||
Research | Language Modeling using LMUs: 10x Better Data Efficiency or Improved Scaling Compared to Transformers | https://arxiv.org/abs/2110.02402 | ||
Research | Improving Document Retrieval with Contextual Compression | https://blog.langchain.dev/improving-document-retrieval-with-contextual-compression/ | ||
Research | The Embedding Archives: Millions of Wikipedia Article Embeddings in Many Languages | https://txt.cohere.com/embedding-archives-wikipedia/ | ||
Research | Hugging Face Hub | https://python.langchain.com/en/latest/modules/models/llms/integrations/huggingface_hub.html | ||
Research | Effective Instruction Tuning | https://twitter.com/vagabondjack/status/1649127428659265537 | ||
Research | Reinforcement Learning with Human Feedback (RLHF) | https://github.com/opendilab/awesome-RLHF | ||
Research | Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes | https://arxiv.org/abs/2304.09433 | ||
Research | Transformer Math 101 | https://blog.eleuther.ai/transformer-math/ | ||
Research | Open-source research on large language models (LLMs) | https://twitter.com/cwolferesearch/status/1647990311547797504 | ||
Research | A visual guide to transformers | https://twitter.com/akshay_pachaar/status/1647940492712345601 | ||
Research | Enhancing Vision-language Understanding with Advanced Large Language Models | https://github.com/Vision-CAIR/MiniGPT-4/blob/main/MiniGPT_4.pdf | ||
Research | Transformer: Attention Is All You Need | https://arxiv.org/abs/1706.03762 | ||
Research | LLMs on personal devices | https://simonwillison.net/series/llms-on-personal-devices/ | ||
Research | LLM Source Context Evaluation | https://twitter.com/jerryjliu0/status/1647626532519841793 | ||
Research | Generative Agents: Interactive Simulacra of Human Behavior | https://arxiv.org/abs/2304.03442 | ||
Research | Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study | https://arxiv.org/abs/2304.06762 | ||
Research | Auto-evaluate LLM Q+A chains | https://twitter.com/RLanceMartin/status/1647645549875859456 | ||
Research | Understanding Diffusion Models: A Unified Perspective | https://arxiv.org/abs/2208.11970 | ||
Research | Building LLM applications for production | https://huyenchip.com/2023/04/11/llm-engineering.html | ||
Research | Boosted Prompt Ensembles for Large Language Models | https://arxiv.org/abs/2304.05970 | ||
Research | Teaching Large Language Models to Self-Debug | https://arxiv.org/abs/2304.05128 | ||
Research | The Power of Scale for Parameter-Efficient Prompt Tuning | https://arxiv.org/abs/2104.08691 | ||
Research | Multimodal Procedural Planning via Dual Text-Image Prompting | https://arxiv.org/abs/2305.01795 | ||
Research | Are Emergent Abilities of Large Language Models a Mirage? | https://arxiv.org/abs/2304.15004 |