DiscoverInterconnects
Interconnects
Claim Ownership

Interconnects

Author: Nathan Lambert

Subscribed: 16Played: 201
Share

Description

Audio essays about the latest developments in AI and interviews with leading scientists in the field. Breaking the hype, understanding what's under the hood, and telling stories.

www.interconnects.ai
73 Episodes
Reverse
Original post: https://www.interconnects.ai/p/openais-o3-the-2024-finale-of-aiChapters00:00 Introduction02:51 o3 overview05:57 Solving the Abstraction and Reasoning Corpus (ARC)10:41 o3’s architecture, cost, and training (hint: still no tree search)16:36 2024: RL returnsFiguresFig 1, Frontier Math resultsFig 2, Coding resultsFig 3, ARC AGI resultsFig 4, ARC AGI result detailsFig 5, ARC AGI example 1Fig 6, ARC AGI example in textFig 7, ARC AGI example “easy” Get full access to Interconnects at www.interconnects.ai/subscribe
Original post: https://www.interconnects.ai/p/the-ai-agent-spectrumChapters00:00 Introduction03:24 Agent cartography08:02 Questions for the near futureFiguresFig 1. multiple feedbacks diagram Get full access to Interconnects at www.interconnects.ai/subscribe
Original post: https://www.interconnects.ai/p/openais-reinforcement-finetuningChapters00:00 Introduction04:19 The impact of reinforcement finetuning’s existence07:29 Hypotheses on reinforcement finetuning’s implementationFiguresFig. 1, Yann’s CakeFig. 2, Grader configFig. 3, RLVR learning curves Get full access to Interconnects at www.interconnects.ai/subscribe
Finbarr Timbers is an AI researcher who writes Artificial Fintelligence — one of the technical AI blog’s I’ve been recommending for a long time — and has a variety of experiences at top AI labs including DeepMind and Midjourney. The goal of this interview was to do a few things:* Revisit what reinforcement learning (RL) actually is, its origins, and its motivations.* Contextualize the major breakthroughs of deep RL in the last decade, from DQN for Atari to AlphaZero to ChatGPT. How could we have seen the resurgence coming? (see the timeline below for the major events we cover)* Modern uses for RL, o1, RLHF, and the future of finetuning all ML models.* Address some of the critiques like “RL doesn’t work yet.”It was a fun one. Listen on Apple Podcasts, Spotify, YouTube, and where ever you get your podcasts. For other Interconnects interviews, go here.Timeline of RL and what was happening at the timeIn the last decade of deep RL, there have been a few phases.* Era 1: Deep RL fundamentals — when modern algorithms we designed and proven.* Era 2: Major projects — AlphaZero, OpenAI 5, and all the projects that put RL on the map.* Era 3: Slowdown — when DeepMind and OpenAI no longer had the major RL projects and cultural relevance declined.* Era 4: RLHF & widening success — RL’s new life post ChatGPT.Covering these is the following events. This is incomplete, but enough to inspire a conversation.Early era: TD Gammon, REINFORCE, Etc2013: Deep Q Learning (Atari)2014: Google acquires DeepMind2016: AlphaGo defeats Lee Sedol2017: PPO paper, AlphaZero (no human data)2018: OpenAI Five, GPT 22019: AlphaStar, robotic sim2real with RL early papers (see blog post)2020: MuZero2021: Decision Transformer2022: ChatGPT, sim2real continues.2023: Scaling laws for RL (blog post), doubt of RL2024: o1, post-training, RL’s bloomInterconnects is a reader-supported publication. Consider becoming a subscriber.Chapters* [00:00:00] Introduction* [00:02:14] Reinforcement Learning Fundamentals* [00:09:03] The Bitter Lesson* [00:12:07] Reward Modeling and Its Challenges in RL* [00:16:03] Historical Milestones in Deep RL* [00:21:18] OpenAI Five and Challenges in Complex RL Environments* [00:25:24] Recent-ish Developments in RL: MuZero, Decision Transformer, and RLHF* [00:30:29] OpenAI's O1 and Exploration in Language Models* [00:40:00] Tülu 3 and Challenges in RL Training for Language Models* [00:46:48] Comparing Different AI Assistants* [00:49:44] Management in AI Research* [00:55:30] Building Effective AI Teams* [01:01:55] The Need for Personal BrandingWe mention* O1 (OpenAI model)* Rich Sutton* University of Alberta* London School of Economics* IBM’s Deep Blue* Alberta Machine Intelligence Institute (AMII)* John Schulman* Claude (Anthropic's AI assistant)* Logan Kilpatrick* Bard (Google's AI assistant)* DeepSeek R1 Lite* Scale AI* OLMo (AI2's language model)* Golden Gate Claude Get full access to Interconnects at www.interconnects.ai/subscribe
Original post: https://www.interconnects.ai/p/openais-o1-using-search-was-a-psyopFiguresFigure 0: OpenAI’s seminal test-time compute plotFigure 1: Setup for bucketed evalsFigure 2: Evals with correctness labelsFigure 3: Grouped evalsFigure 4: Hypothetical inference scaling law Get full access to Interconnects at www.interconnects.ai/subscribe
Full post: https://www.interconnects.ai/p/olmo-2-and-building-language-model-trainingOLMo 2 demo: https://playground.allenai.org/OLMo 2 artifacts: https://huggingface.co/collections/allenai/olmo-2-674117b93ab84e98afc72edcChapters00:00 Building AI Teams06:35 OLMo 2FiguresFig 1, pretrain plot: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/olmo2/pretrain.webpFig 2, pretrain table: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/olmo2/pretrain-table.webpFig 3, post-train table: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/olmo2/postrain-table.webp Get full access to Interconnects at www.interconnects.ai/subscribe
Original post: https://www.interconnects.ai/p/tulu-3Chapters00:00 History05:44 Technical details sneak peakFiguresFig 1, results: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/tulu3-img/results.webpFig 2, overview: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/tulu3-img/overview.webpFig 3, preferences: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/tulu3-img/preferences.webpFig 4, RLVR: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/tulu3-img/rlvr.webp Get full access to Interconnects at www.interconnects.ai/subscribe
Original post: https://www.interconnects.ai/p/scaling-realities Get full access to Interconnects at www.interconnects.ai/subscribe
Original post: https://www.interconnects.ai/p/saving-the-nairrChapters05:26: Do we need an AI research resource or an LM research resource?08:59: Policy roundups Get full access to Interconnects at www.interconnects.ai/subscribe
Tim Dettmers does not need an introduction for most people building open-source AI. If you are part of that minority, you’re in for a treat. Tim is the lead developer behind most of the open-source tools for quantization: QLoRA, bitsandbytes, 4 and 8 bit inference, and plenty more. He recently finished his Ph.D. at the University of Washington, is now a researcher at the Allen Institute for AI, and is starting as a professor at Carnegie Mellon University in fall of 2025.Tim is a joy to talk to. He thinks independently on all the AI issues of today, bringing new perspectives that challenge the status quo. At the same time, he’s sincere and very helpful to work with, working hard to uplift those around him and the academic community. There’s a reason he’s so loved in the open-source AI community.Find more about Tim on his Twitter or Google Scholar. He also has a great blog where he talks about things like which GPUs to buy and which grad school to choose.Listen on Apple Podcasts, Spotify, YouTube, and where ever you get your podcasts. For other Interconnects interviews, go here.Show NotesHere's a markdown list of companies, people, projects, research papers, and other key named entities mentioned in the transcript:* QLoRA* Bits and Bytes* Llama 3* Apple Intelligence* SWE Bench* RewardBench* Claude (AI assistant by Anthropic)* Transformers (Hugging Face library)* Gemma (Google's open weight language model)* Notebook LM* LangChain* LangGraph* Weights & Biases* Blackwell (NVIDIA GPU architecture)* Perplexity* Branch Train Merge (research paper)* "ResNets do iterative refinement on features" (research paper)* CIFAR-10 and CIFAR-100 (computer vision datasets)* Lottery Ticket Hypothesis (research paper)* OpenAI O1* TRL (Transformer Reinforcement Learning) by Hugging Face* Tim's work on quantization (this is just one example)Timestamps* [00:00:00] Introduction and background on Tim Dettmers* [00:01:53] Future of open source AI models* [00:09:44] SWE Bench and evaluating AI systems* [00:13:33] Using AI for coding, writing, and thinking* [00:16:09] Academic research with limited compute* [00:32:13] Economic impact of AI* [00:36:49] User experience with different AI models* [00:39:42] O1 models and reasoning in AI* [00:46:27] Instruction tuning vs. RLHF and synthetic data* [00:51:16] Model merging and optimization landscapes* [00:55:08] Knowledge distillation and optimization dynamics* [01:01:55] State-space models and transformer dominance* [01:06:00] Definition and future of AI agents* [01:09:20] The limit of quantizationTranscript and full details: https://www.interconnects.ai/p/tim-dettmersGet Interconnects (https://www.interconnects.ai/)...... on YouTube: https://www.youtube.com/@interconnects... on Twitter: https://x.com/interconnectsai... on Linkedin: https://www.linkedin.com/company/interconnects-ai... on Spotify: https://open.spotify.com/show/2UE6s7wZC4kiXYOnWRuxGv… on Apple Podcasts: https://podcasts.apple.com/us/podcast/interconnects/id1719552353 Get full access to Interconnects at www.interconnects.ai/subscribe
Andrew Carr is co-founder and chief scientist at Cartwheel, where he is building text-to-motion AI models and products for gaming, film, and other creative endeavors. We discuss how to keep generative AI fun and expansive — niche powerful use-cases, AI poetry, AI devices like Meta RayBans, generalization to new domains like robotics, and building successful AI research cultures.Andrew is one of my well read friends on the directions AI is going, so it is great to bring him in for an official conversation. He spent time at OpenAI working on Codex, Gretel AI, and is an editor for the TLDR AI Newsletter.Listen on Apple Podcasts, Spotify, YouTube, and where ever you get your podcasts. For other Interconnects interviews, go here.Show NotesNamed entities and papers mentioned in the podcast transcript:* Codex and GitHub Copilot* Gretel AI* TLDR AI Newsletter* Claude Computer Use* Blender 3D simulator* Common Sense Machines* HuggingFace Simulate, Unity, Godot* Runway ML* Mark Chen, OpenAI Frontiers Team Lead* Meta’s Lingua, Spirit LM, torchtitan and torchchat* Self-Rewarding Language Models paper* Meta Movie Gen paperTimestamps* [00:00] Introduction to Andrew and Cartwheel* [07:00] Differences between Cartwheel and robotic foundation models* [13:33] Claude computer use* [18:45] Supervision and creativity in AI-generated content* [23:26] Adept AI and challenges in building AI agents* [30:56] Successful AI research culture at OpenAI and elsewhere* [38:00] Keeping up with AI research* [44:36] Meta Ray-Ban smart glasses and AI assistants* [51:17] Meta's strategy with Llama and open source AITranscript & Full Show Notes: https://www.interconnects.ai/p/interviewing-andrew-carr Get full access to Interconnects at www.interconnects.ai/subscribe
Full post:https://www.interconnects.ai/p/why-i-build-open-language-models Get full access to Interconnects at www.interconnects.ai/subscribe
How Claude's computer use works. Where OpenAI, Anthropic, and Google all have a lead on eachother.Original post: https://www.interconnects.ai/p/claudes-agencyChapters00:00 Claude's agentic future and the current state of the frontier models04:43 The state of the frontier models04:49 1. Anthropic has the best model we are accustomed to using05:27 Google has the best small & cheap model for building automation and basic AI engineering08:07 OpenAI has the best model for reasoning, but we don’t know how to use it09:12 All of the laboratories have much larger models they’re figuring out how to release (and use)10:42 Who wins?FiguresFig 1, Sonnet New Benchmarks: https://substackcdn.com/image/fetch/w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d2e63ff-ac9f-4f8e-9749-9ef2b9b25b6c_1290x1290.pngFig 2, Sonnet Old Benchmarks: https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bccbd4d-f1c8-4a38-a474-69a3df8a4448_2048x1763.pngGet Interconnects (https://www.interconnects.ai/)...... on YouTube: https://www.youtube.com/@interconnects... on Twitter: https://x.com/interconnectsai... on Linkedin: https://www.linkedin.com/company/interconnects-ai... on Spotify: https://open.spotify.com/show/2UE6s7wZC4kiXYOnWRuxGv… on Apple Podcasts: https://podcasts.apple.com/us/podcast/interconnects/id1719552353 Get full access to Interconnects at www.interconnects.ai/subscribe
Arvind Narayanan is a leading voice disambiguating what AI does and does not do. His work, with Sayash Kapoor at AI Snake Oil, is one of the few beacons of reasons in a AI media ecosystem with quite a few bad Apples. Arvind is a professor of computer science at Princeton University and the director of the Center for Information Technology Policy. You can learn more about Arvind and his work on his website, X, or Google Scholar.This episode is all in on figuring out what current LLMs do and don’t do. We cover AGI, agents, scaling laws, autonomous scientists, and past failings of AI (i.e. those that came before generative AI took off). We also briefly touch on how all of this informs AI policy, and what academics can do to decide on what to work on to generate better outcomes for technology.Transcript and full show notes: https://www.interconnects.ai/p/interviewing-arvind-narayananChapters* [00:00:00] Introduction* [00:01:54] Balancing being an AI critic while recognizing AI's potential* [00:04:57] Challenges in AI policy discussions* [00:08:47] Open source foundation models and their risks* [00:15:35] Personal use cases for generative AI* [00:22:19] CORE-Bench and evaluating AI scientists* [00:25:35] Agents and artificial general intelligence (AGI)* [00:33:12] Scaling laws and AI progress* [00:37:41] Applications of AI outside of tech* [00:39:10] Career lessons in technology and AI research* [00:41:33] Privacy concerns and AI* [00:47:06] Legal threats and responsible research communication* [00:50:01] Balancing scientific research and public distributionGet Interconnects (https://www.interconnects.ai/podcast)...... on YouTube: https://www.youtube.com/@interconnects... on Twitter: https://x.com/interconnectsai... on Linkedin: https://www.linkedin.com/company/interconnects-ai... on Spotify: https://open.spotify.com/show/2UE6s7wZC4kiXYOnWRuxGv Get full access to Interconnects at www.interconnects.ai/subscribe
Read the full post here: https://www.interconnects.ai/p/building-on-evaluation-quicksandChapters00:00 Building on evaluation quicksand01:26 The causes of closed evaluation silos06:35 The challenge facing open evaluation tools10:47 Frontiers in evaluation11:32 New types of synthetic data contamination13:57 Building harder evaluationsFiguresFig 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/manual/openai-predictions.webp Get full access to Interconnects at www.interconnects.ai/subscribe
Andrew Trask is one of the bright spots in engaging with AI policy for me in the last year. He is a passionate idealist, trying to create a future for AI that enables privacy, academic research, and government involvement in a rapidly transforming ecosystem. Trask is a leader of the OpenMined organization facilitating researcher access to non-public data and AIs, a senior research scientist at Google DeepMind, a PhD student at the University of Oxford, an author and educator on Deep Learning.You can find more about Trask on Twitter or Google Scholar. You may want to watch his recent talk at Cohere on the future of AI (and why data breakthroughs dominate), his lecture at MIT on privacy preserving ML, or his book on deep learning that has a substantial GitHub component. Here’s a slide I liked from his recent Cohere talk:The organization he helps run, OpenMined, has a few principles that say a lot about his ambitions and approaches to modern AI:We believe we can inspire all data owners to open their data for research by building open-source privacy software that empowers them to receive more benefits (co-authorships, citations, grants, etc.) while mitigating risks related to privacy, security, and IP.We cover privacy of LLMs, retrieval LLMs, secure enclaves, o1, Apple's new models, and many more topics.More on Andrew: https://x.com/iamtraskTranscript and more information: https://www.interconnects.ai/p/interviewing-andrew-traskInterconnects (https://www.interconnects.ai/)...... on YouTube: https://www.youtube.com/@interconnects... on Twitter: https://x.com/interconnectsai... on Linkedin: https://www.linkedin.com/company/interconnects-ai... on Spotify: https://open.spotify.com/show/2UE6s7wZC4kiXYOnWRuxGvWe Mention* Claude 3.5 launch and “pre release testing with UK AISI” (and the US AI Safety Institute)* OpenMined and PySyft* CSET (Center for Security and Emerging Technology)* NAIRR* The “open data wall”* Apple’s Secure Enclaves, Nvidia Secure Enclave* Data-store language models literature* RETRO: Retrieval-Enhanced Transformer from DeepMind (2021)* SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore (2023)* Scaling Retrieval-Based Language Models with a Trillion-Token Datastore (2024)Chapters[00:00:00] Introduction[00:03:12] Secure enclaves and pre-release testing with Anthropic and UK Safety Institute[00:16:31] Discussion on public AI and government involvement[00:20:55] Data store language models and better approaches to “open training data”[00:42:18] History and development of OpenMined[00:48:57] Use of language models on air-gapped networks[00:52:10] Near future of secure enclave technology and industry adoption[00:58:01] Conclusions and future trajectory of AI development Get full access to Interconnects at www.interconnects.ai/subscribe
How scaling changes model behaviorSome trends are reasonable to extrapolate, some are not. Even for the trends we are succeeding at extrapolating, it is not clear how that signal translates into different AI behaviors.Read it here: https://www.interconnects.ai/p/how-scaling-changes-model-behavior[00:00] How scaling changes model behavior[05:03] Metaphors for what scaling may solve[08:45] Short-term scaling is already de-riskedFig. 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/manual/openai-predictions.webpFig. 2: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/manual/scaling-laws.webpFig. 3: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/manual/situational-awareness.webp Get full access to Interconnects at www.interconnects.ai/subscribe
SB1047's veto, OpenAI's turnover, and a constant treadmill pushing AI startups to be all too similar to big technology name brands.This is AI generated audio with Python and 11Labs.Source code: https://github.com/natolambert/interconnects-toolsOriginal post: https://www.interconnects.ai/p/ai-safety-culture-vs-capitalism00:00 AI Safety's Crux: Culture v Capitalism06:03 SB1047 as a regulatory litmus test for AI safety08:36 Capitalism at the helm Get full access to Interconnects at www.interconnects.ai/subscribe
Riley Goodside is a staff prompting engineer at Scale AI. Previously working in data science, he is often seen as the default for the new role of a “prompt engineer.” He regularly posts incisive prompts that illicit notable behavior from the most popular AI models.I really resonated with this saying from Anthropic’s recent podcast on prompt engineering — “now we write essays and treat them as code.” In order to be good at prompting, you need to understand that natural language operates as our code used to.This episode is a masterclass on why you should care about prompting and how it impacts results. Of course, there’s a bunch of great discussion on recent models that reflect the need for different and or better prompting. Enjoy it!Listen on Apple Podcasts, Spotify, and where ever you get your podcasts. For other Interconnects interviews, go here.We mention:* Prompting to push the frontier of AI models,* Post-training and prompting interaction,* Prompting base models,* o1, Reflection 70B, reasoning,* Scale’s leaderboard, evaluation tricks, evaluation needs,* PlanSearch paper* Julius AI* “The hottest programming language is english”* “Think silently” instructions* Scale Leaderboard and Humanity’s Last Exam* ChatML formattingChapters* [00:00:09] Introduction* [00:02:40] Riley's path to LLMs* [00:07:54] Impact of ChatGPT on prompt engineering* [00:12:03] OpenAI's o1* [00:18:21] Autoregressive inference and prompting sensitivities* [00:24:48] Reflection 70B model and its implications* [00:28:00] Impact of prompting on evaluation* [00:32:43] Prompting vs. Google search* [00:46:55] Prompting and RLHF/post-training* [00:56:57] Prompting of AI agents* [01:01:20] Importance of hands-on experience with language models* [01:05:00] Importance and challenges of AI model evaluationTranscriptBuilt with smol-podcaster.Nathan L. [00:01:08]: Hey, Riley, welcome to the show.Riley G. Hey, Nathan, great to be here.Nathan L. [00:01:14]: Yeah, so for the audience here, I mostly wanted to try to, as I work on post-training a lot and I see my own difficulty in taking prompting seriously and the things that I don't think that we are doing enough, and I don't see any reason why it can't be scientific in how we do prompting. So that's my biggest goal with this. I think there's a lot of podcasts where we could kind of say, like, what is the history of prompting? Where is it going? And that's easy to kind of redo. And I still find it interesting, but I just don't think there's enough people talking about the role of prompting in evaluation, how prompting changes with how your post-training models, because we're trying to take that seriously and how we have a post-training setup, but we just like regularly run into these things like system prompts aren't handled well, how to release a model of a system prompt. So that's the tone that I'm trying to get to when I ask these questions. And also OpenAI's 01 model just came out, so I'm definitely going to get onto that pretty quickly because that's what everyone's excited about. I like to start with background just to kind of get to know people, because a lot of this is just, I want to talk to interesting people in AI, is like, how did you become interested in prompting? I think I've seen your background in data science and then your joint scale around when Chad2BT came out, which is fun timing, but like, how did you become maybe obsessed with this, but like the focal point of your work?Riley G. [00:02:40]: Yeah, I have sort of an unusual introduction to large language models. For most of my career, I've been a data scientist, mostly in the on-mandating industry. I was at OkCupid and Grindr. And after I left Grindr, I took sort of a sabbatical to educate myself, I guess, about the progress in large language models. It was around the time that GPT-3 codecs had just come out. And that was where I think I started to become really interested because I was following along with maybe, certainly when GPT-2 came out, the examples there wowed me as much as they wowed the rest of the world, I think, with the example of the news article about the unicorn and all that. And not long after that, we had AI Dungeon, and I played around with AI Dungeon a bit. But at that point, language models seemed to be mostly about language, that they were sort of very heavily focused on stylistic mimicry and creative writing and so on. And when Codex came out, it really started this thought of that text is a more universal interface than we were giving you credit for, that language models might be more broadly useful. And I just became very excited in a practical sense of what these models could do for what I kind of intuited was very boilerplate-like data science code, that I thought of like most of the Python and Julia and R and things that I've written over my career, this seemed like stuff that an LLM could handle. And that was sort of one of its early strong points. So I was playing around with, I think one of my first projects was a VS Code extension that had some kind of integration with Codex. But I never really shipped anything out of it. And mostly what it transitioned into pretty quickly was playing around with posting prompting examples on Twitter, because when I looked out online to find what were people saying about how to prompt these models, there really wasn't much out there. And so I had to kind of resort to just like the few examples that had been circulating in viral screenshots of humorous completions and so on, of like the results that people got out of it. And I started posting those examples. I started following academics and low-level engineers at the research labs and anyone that was working in shipping language models I thought were interesting. And elbowed my way in.Nathan L. [00:05:18]: I have more questions on this, because I find it like, some people find, there's this whole like Twitter dynamic of like, you find so much signal there, but the question is like, how much does it generalize? Because there's so many of the lessons you can learn from these models, from these examples. I think the straw, like the number of R's in strawberry things is the current one. And then, and it's like, do you get a sense that these are transient or are these kind of repeated themes? And like, how should you read these examples to try to extract themes from them? If like, I've followed you for a while, and a lot of people do, and you're more insightful in how you post them. If you post these threads with like multiple tries and stuff like this, like, should people be doing that when they see something pop up?Riley G. [00:06:03]: I think so. I also would say that Twitter is a very different river to step into now than it was back then. At the point that I started doing this, like, nobody was really talking about these things that much, or to the extent they were, it was sort of fleeting. It was like, wow, look at this, and then they on to the next thing. And I think the thing that's very different now is just that because there are so many new entrants in AI and LLM, there's a lot of rehashing of the basics. And I think a lot of people in the industry would tell you that the popular examples that you see around of like, how many R's are in strawberry, or some of the ones that I'm partially responsible for, popularizing at least. I think like, these things are really just like, rookie mistakes in some sense, right? That these are things that we've long known language models can't do. And it just keeps popping up as a surprising quirk of language models that I think the public is just confused that something could be so good at so many other things and so bad at this. Right? That is seemingly trivial task, and that is hard to explain to people. And the answer to that hasn't really changed much in the past few years. They're generally bad at spelling for kind of the same reasons they were bad at spelling two or three years ago.Nathan L. [00:07:27]: Yeah. I mean, like, how did these things change with ChatGPT? Because ChatGPT is like the introduction of RLHF into these models. And I think, I didn't write this down as a question, but there's like the difference in patronizing base models and instruction models and RLHF models, which I think that for most of this discussion, it's like the end model, the like chat RLHF model is the one that people think about. But was that a big transition point in your work or is it just kind of plugging along? Right.Riley G. [00:07:54]: I mean, I would say, I don't think it's any understatement to say that, or sorry, any overstatement to say that, that the release of ChatGPT was probably the single biggest event in the history of prompt engineering in that prompt engineering became drastically easier after ChatGPT came out. And most other models learned from the ChatGPT way of doing things, right? That they, like, I think people forget just how fiddly prompt engineering used to be, right? Like people today don't think about things like frequency and presence penalties, right? They used to be that by default, you would get very repetitious output and you had to work to avoid that. People forgot about like, don't end your prompt in a space, right? That you had to understand how tokenization worked at all times, because like, if you put an extra space in there, you were going to go out of distribution. I think that, or another one that I think is particularly vivid for me is Yobi Reel that in June of 2022, Douglas Hofstadter had a piece in The Economist showing the, what he called the hollowness of GPT-3's understanding of the world, that it failed on various simple questions. Like, when was the Golden Gate Bridge transported for the second time across Egypt and so on? And someone, I believe it was Nick Camerota of OpenAI, showed that you could fix almost all of these just by telling the model that if you gave it a silly question, say Yobi Reel instead of answering it, right? That model
Sorry this one was late! Thanks for bearing with me, and keep sending feedback my way. Still a year or two away from when I have time to record these, but I would love to.Open-source tools, examples, limits, and the state of training multimodal models.This is AI generated audio with Python and 11Labs.Source code: https://github.com/natolambert/interconnects-toolsOriginal post: https://www.interconnects.ai/p/molmo-and-llama-3-vision00:00 Llama 3.2 Vision and Molmo: Foundations for the multimodal open-source ecosystem02:47 Llama vision: Multimodality for the masses of developers03:27 Molmo: a (mostly) open-source equivalent to Llama vision08:45 How adding vision changes capabilities and reasoning11:47 Multimodal language models: Earlier on the exponentialFig 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_013.pngFig 2: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_015.pngFig 3: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_021.pngFig 4: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_023.pngFig 5: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_027.pngFig 6: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_030.pngFig 7: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_037.pngFig 8: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_046.pngFig 9: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_048.pngFig 10: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_050.pngFig 11: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_052.pngFig 12: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_054.pngFig 13: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_058.pngFig 14: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_065.png Get full access to Interconnects at www.interconnects.ai/subscribe
loading