DiscoverInterconnects
Interconnects
Claim Ownership

Interconnects

Author: Nathan Lambert

Subscribed: 13Played: 143
Share

Description

Audio essays about the latest developments in AI and interviews with leading scientists in the field. Breaking the hype, understanding what's under the hood, and telling stories.

www.interconnects.ai
61 Episodes
Reverse
How Claude's computer use works. Where OpenAI, Anthropic, and Google all have a lead on eachother.Original post: https://www.interconnects.ai/p/claudes-agencyChapters00:00 Claude's agentic future and the current state of the frontier models04:43 The state of the frontier models04:49 1. Anthropic has the best model we are accustomed to using05:27 Google has the best small & cheap model for building automation and basic AI engineering08:07 OpenAI has the best model for reasoning, but we don’t know how to use it09:12 All of the laboratories have much larger models they’re figuring out how to release (and use)10:42 Who wins?FiguresFig 1, Sonnet New Benchmarks: https://substackcdn.com/image/fetch/w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5d2e63ff-ac9f-4f8e-9749-9ef2b9b25b6c_1290x1290.pngFig 2, Sonnet Old Benchmarks: https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bccbd4d-f1c8-4a38-a474-69a3df8a4448_2048x1763.pngGet Interconnects (https://www.interconnects.ai/)...... on YouTube: https://www.youtube.com/@interconnects... on Twitter: https://x.com/interconnectsai... on Linkedin: https://www.linkedin.com/company/interconnects-ai... on Spotify: https://open.spotify.com/show/2UE6s7wZC4kiXYOnWRuxGv… on Apple Podcasts: https://podcasts.apple.com/us/podcast/interconnects/id1719552353 Get full access to Interconnects at www.interconnects.ai/subscribe
Arvind Narayanan is a leading voice disambiguating what AI does and does not do. His work, with Sayash Kapoor at AI Snake Oil, is one of the few beacons of reasons in a AI media ecosystem with quite a few bad Apples. Arvind is a professor of computer science at Princeton University and the director of the Center for Information Technology Policy. You can learn more about Arvind and his work on his website, X, or Google Scholar.This episode is all in on figuring out what current LLMs do and don’t do. We cover AGI, agents, scaling laws, autonomous scientists, and past failings of AI (i.e. those that came before generative AI took off). We also briefly touch on how all of this informs AI policy, and what academics can do to decide on what to work on to generate better outcomes for technology.Transcript and full show notes: https://www.interconnects.ai/p/interviewing-arvind-narayananChapters* [00:00:00] Introduction* [00:01:54] Balancing being an AI critic while recognizing AI's potential* [00:04:57] Challenges in AI policy discussions* [00:08:47] Open source foundation models and their risks* [00:15:35] Personal use cases for generative AI* [00:22:19] CORE-Bench and evaluating AI scientists* [00:25:35] Agents and artificial general intelligence (AGI)* [00:33:12] Scaling laws and AI progress* [00:37:41] Applications of AI outside of tech* [00:39:10] Career lessons in technology and AI research* [00:41:33] Privacy concerns and AI* [00:47:06] Legal threats and responsible research communication* [00:50:01] Balancing scientific research and public distributionGet Interconnects (https://www.interconnects.ai/podcast)...... on YouTube: https://www.youtube.com/@interconnects... on Twitter: https://x.com/interconnectsai... on Linkedin: https://www.linkedin.com/company/interconnects-ai... on Spotify: https://open.spotify.com/show/2UE6s7wZC4kiXYOnWRuxGv Get full access to Interconnects at www.interconnects.ai/subscribe
Read the full post here: https://www.interconnects.ai/p/building-on-evaluation-quicksandChapters00:00 Building on evaluation quicksand01:26 The causes of closed evaluation silos06:35 The challenge facing open evaluation tools10:47 Frontiers in evaluation11:32 New types of synthetic data contamination13:57 Building harder evaluationsFiguresFig 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/manual/openai-predictions.webp Get full access to Interconnects at www.interconnects.ai/subscribe
Andrew Trask is one of the bright spots in engaging with AI policy for me in the last year. He is a passionate idealist, trying to create a future for AI that enables privacy, academic research, and government involvement in a rapidly transforming ecosystem. Trask is a leader of the OpenMined organization facilitating researcher access to non-public data and AIs, a senior research scientist at Google DeepMind, a PhD student at the University of Oxford, an author and educator on Deep Learning.You can find more about Trask on Twitter or Google Scholar. You may want to watch his recent talk at Cohere on the future of AI (and why data breakthroughs dominate), his lecture at MIT on privacy preserving ML, or his book on deep learning that has a substantial GitHub component. Here’s a slide I liked from his recent Cohere talk:The organization he helps run, OpenMined, has a few principles that say a lot about his ambitions and approaches to modern AI:We believe we can inspire all data owners to open their data for research by building open-source privacy software that empowers them to receive more benefits (co-authorships, citations, grants, etc.) while mitigating risks related to privacy, security, and IP.We cover privacy of LLMs, retrieval LLMs, secure enclaves, o1, Apple's new models, and many more topics.More on Andrew: https://x.com/iamtraskTranscript and more information: https://www.interconnects.ai/p/interviewing-andrew-traskInterconnects (https://www.interconnects.ai/)...... on YouTube: https://www.youtube.com/@interconnects... on Twitter: https://x.com/interconnectsai... on Linkedin: https://www.linkedin.com/company/interconnects-ai... on Spotify: https://open.spotify.com/show/2UE6s7wZC4kiXYOnWRuxGvWe Mention* Claude 3.5 launch and “pre release testing with UK AISI” (and the US AI Safety Institute)* OpenMined and PySyft* CSET (Center for Security and Emerging Technology)* NAIRR* The “open data wall”* Apple’s Secure Enclaves, Nvidia Secure Enclave* Data-store language models literature* RETRO: Retrieval-Enhanced Transformer from DeepMind (2021)* SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore (2023)* Scaling Retrieval-Based Language Models with a Trillion-Token Datastore (2024)Chapters[00:00:00] Introduction[00:03:12] Secure enclaves and pre-release testing with Anthropic and UK Safety Institute[00:16:31] Discussion on public AI and government involvement[00:20:55] Data store language models and better approaches to “open training data”[00:42:18] History and development of OpenMined[00:48:57] Use of language models on air-gapped networks[00:52:10] Near future of secure enclave technology and industry adoption[00:58:01] Conclusions and future trajectory of AI development Get full access to Interconnects at www.interconnects.ai/subscribe
How scaling changes model behaviorSome trends are reasonable to extrapolate, some are not. Even for the trends we are succeeding at extrapolating, it is not clear how that signal translates into different AI behaviors.Read it here: https://www.interconnects.ai/p/how-scaling-changes-model-behavior[00:00] How scaling changes model behavior[05:03] Metaphors for what scaling may solve[08:45] Short-term scaling is already de-riskedFig. 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/manual/openai-predictions.webpFig. 2: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/manual/scaling-laws.webpFig. 3: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/manual/situational-awareness.webp Get full access to Interconnects at www.interconnects.ai/subscribe
SB1047's veto, OpenAI's turnover, and a constant treadmill pushing AI startups to be all too similar to big technology name brands.This is AI generated audio with Python and 11Labs.Source code: https://github.com/natolambert/interconnects-toolsOriginal post: https://www.interconnects.ai/p/ai-safety-culture-vs-capitalism00:00 AI Safety's Crux: Culture v Capitalism06:03 SB1047 as a regulatory litmus test for AI safety08:36 Capitalism at the helm Get full access to Interconnects at www.interconnects.ai/subscribe
Riley Goodside is a staff prompting engineer at Scale AI. Previously working in data science, he is often seen as the default for the new role of a “prompt engineer.” He regularly posts incisive prompts that illicit notable behavior from the most popular AI models.I really resonated with this saying from Anthropic’s recent podcast on prompt engineering — “now we write essays and treat them as code.” In order to be good at prompting, you need to understand that natural language operates as our code used to.This episode is a masterclass on why you should care about prompting and how it impacts results. Of course, there’s a bunch of great discussion on recent models that reflect the need for different and or better prompting. Enjoy it!Listen on Apple Podcasts, Spotify, and where ever you get your podcasts. For other Interconnects interviews, go here.We mention:* Prompting to push the frontier of AI models,* Post-training and prompting interaction,* Prompting base models,* o1, Reflection 70B, reasoning,* Scale’s leaderboard, evaluation tricks, evaluation needs,* PlanSearch paper* Julius AI* “The hottest programming language is english”* “Think silently” instructions* Scale Leaderboard and Humanity’s Last Exam* ChatML formattingChapters* [00:00:09] Introduction* [00:02:40] Riley's path to LLMs* [00:07:54] Impact of ChatGPT on prompt engineering* [00:12:03] OpenAI's o1* [00:18:21] Autoregressive inference and prompting sensitivities* [00:24:48] Reflection 70B model and its implications* [00:28:00] Impact of prompting on evaluation* [00:32:43] Prompting vs. Google search* [00:46:55] Prompting and RLHF/post-training* [00:56:57] Prompting of AI agents* [01:01:20] Importance of hands-on experience with language models* [01:05:00] Importance and challenges of AI model evaluationTranscriptBuilt with smol-podcaster.Nathan L. [00:01:08]: Hey, Riley, welcome to the show.Riley G. Hey, Nathan, great to be here.Nathan L. [00:01:14]: Yeah, so for the audience here, I mostly wanted to try to, as I work on post-training a lot and I see my own difficulty in taking prompting seriously and the things that I don't think that we are doing enough, and I don't see any reason why it can't be scientific in how we do prompting. So that's my biggest goal with this. I think there's a lot of podcasts where we could kind of say, like, what is the history of prompting? Where is it going? And that's easy to kind of redo. And I still find it interesting, but I just don't think there's enough people talking about the role of prompting in evaluation, how prompting changes with how your post-training models, because we're trying to take that seriously and how we have a post-training setup, but we just like regularly run into these things like system prompts aren't handled well, how to release a model of a system prompt. So that's the tone that I'm trying to get to when I ask these questions. And also OpenAI's 01 model just came out, so I'm definitely going to get onto that pretty quickly because that's what everyone's excited about. I like to start with background just to kind of get to know people, because a lot of this is just, I want to talk to interesting people in AI, is like, how did you become interested in prompting? I think I've seen your background in data science and then your joint scale around when Chad2BT came out, which is fun timing, but like, how did you become maybe obsessed with this, but like the focal point of your work?Riley G. [00:02:40]: Yeah, I have sort of an unusual introduction to large language models. For most of my career, I've been a data scientist, mostly in the on-mandating industry. I was at OkCupid and Grindr. And after I left Grindr, I took sort of a sabbatical to educate myself, I guess, about the progress in large language models. It was around the time that GPT-3 codecs had just come out. And that was where I think I started to become really interested because I was following along with maybe, certainly when GPT-2 came out, the examples there wowed me as much as they wowed the rest of the world, I think, with the example of the news article about the unicorn and all that. And not long after that, we had AI Dungeon, and I played around with AI Dungeon a bit. But at that point, language models seemed to be mostly about language, that they were sort of very heavily focused on stylistic mimicry and creative writing and so on. And when Codex came out, it really started this thought of that text is a more universal interface than we were giving you credit for, that language models might be more broadly useful. And I just became very excited in a practical sense of what these models could do for what I kind of intuited was very boilerplate-like data science code, that I thought of like most of the Python and Julia and R and things that I've written over my career, this seemed like stuff that an LLM could handle. And that was sort of one of its early strong points. So I was playing around with, I think one of my first projects was a VS Code extension that had some kind of integration with Codex. But I never really shipped anything out of it. And mostly what it transitioned into pretty quickly was playing around with posting prompting examples on Twitter, because when I looked out online to find what were people saying about how to prompt these models, there really wasn't much out there. And so I had to kind of resort to just like the few examples that had been circulating in viral screenshots of humorous completions and so on, of like the results that people got out of it. And I started posting those examples. I started following academics and low-level engineers at the research labs and anyone that was working in shipping language models I thought were interesting. And elbowed my way in.Nathan L. [00:05:18]: I have more questions on this, because I find it like, some people find, there's this whole like Twitter dynamic of like, you find so much signal there, but the question is like, how much does it generalize? Because there's so many of the lessons you can learn from these models, from these examples. I think the straw, like the number of R's in strawberry things is the current one. And then, and it's like, do you get a sense that these are transient or are these kind of repeated themes? And like, how should you read these examples to try to extract themes from them? If like, I've followed you for a while, and a lot of people do, and you're more insightful in how you post them. If you post these threads with like multiple tries and stuff like this, like, should people be doing that when they see something pop up?Riley G. [00:06:03]: I think so. I also would say that Twitter is a very different river to step into now than it was back then. At the point that I started doing this, like, nobody was really talking about these things that much, or to the extent they were, it was sort of fleeting. It was like, wow, look at this, and then they on to the next thing. And I think the thing that's very different now is just that because there are so many new entrants in AI and LLM, there's a lot of rehashing of the basics. And I think a lot of people in the industry would tell you that the popular examples that you see around of like, how many R's are in strawberry, or some of the ones that I'm partially responsible for, popularizing at least. I think like, these things are really just like, rookie mistakes in some sense, right? That these are things that we've long known language models can't do. And it just keeps popping up as a surprising quirk of language models that I think the public is just confused that something could be so good at so many other things and so bad at this. Right? That is seemingly trivial task, and that is hard to explain to people. And the answer to that hasn't really changed much in the past few years. They're generally bad at spelling for kind of the same reasons they were bad at spelling two or three years ago.Nathan L. [00:07:27]: Yeah. I mean, like, how did these things change with ChatGPT? Because ChatGPT is like the introduction of RLHF into these models. And I think, I didn't write this down as a question, but there's like the difference in patronizing base models and instruction models and RLHF models, which I think that for most of this discussion, it's like the end model, the like chat RLHF model is the one that people think about. But was that a big transition point in your work or is it just kind of plugging along? Right.Riley G. [00:07:54]: I mean, I would say, I don't think it's any understatement to say that, or sorry, any overstatement to say that, that the release of ChatGPT was probably the single biggest event in the history of prompt engineering in that prompt engineering became drastically easier after ChatGPT came out. And most other models learned from the ChatGPT way of doing things, right? That they, like, I think people forget just how fiddly prompt engineering used to be, right? Like people today don't think about things like frequency and presence penalties, right? They used to be that by default, you would get very repetitious output and you had to work to avoid that. People forgot about like, don't end your prompt in a space, right? That you had to understand how tokenization worked at all times, because like, if you put an extra space in there, you were going to go out of distribution. I think that, or another one that I think is particularly vivid for me is Yobi Reel that in June of 2022, Douglas Hofstadter had a piece in The Economist showing the, what he called the hollowness of GPT-3's understanding of the world, that it failed on various simple questions. Like, when was the Golden Gate Bridge transported for the second time across Egypt and so on? And someone, I believe it was Nick Camerota of OpenAI, showed that you could fix almost all of these just by telling the model that if you gave it a silly question, say Yobi Reel instead of answering it, right? That model
Sorry this one was late! Thanks for bearing with me, and keep sending feedback my way. Still a year or two away from when I have time to record these, but I would love to.Open-source tools, examples, limits, and the state of training multimodal models.This is AI generated audio with Python and 11Labs.Source code: https://github.com/natolambert/interconnects-toolsOriginal post: https://www.interconnects.ai/p/molmo-and-llama-3-vision00:00 Llama 3.2 Vision and Molmo: Foundations for the multimodal open-source ecosystem02:47 Llama vision: Multimodality for the masses of developers03:27 Molmo: a (mostly) open-source equivalent to Llama vision08:45 How adding vision changes capabilities and reasoning11:47 Multimodal language models: Earlier on the exponentialFig 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_013.pngFig 2: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_015.pngFig 3: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_021.pngFig 4: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_023.pngFig 5: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_027.pngFig 6: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_030.pngFig 7: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_037.pngFig 8: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_046.pngFig 9: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_048.pngFig 10: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_050.pngFig 11: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_052.pngFig 12: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_054.pngFig 13: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_058.pngFig 14: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-and-molmo/img_065.png Get full access to Interconnects at www.interconnects.ai/subscribe
What productionizing test-time compute shows us about the future of AI. Exploration has landed in language model training.This is AI generated audio with Python and 11Labs.Source code: https://github.com/natolambert/interconnects-toolsOriginal post: https://www.interconnects.ai/p/reverse-engineering-openai-o100:00 Reverse engineering OpenAI's o101:52 From Q-star to Strawberry to o105:13 Training o1 with reinforcement learning09:24 What is o1 doing when given a prompt?11:49 Questions to consider to understand o1's structure11:56 1. How does an RL-trained language model act?12:38 2. Is it an online / test-time search?14:20 3. Is it one model at inference?15:29 Open-source o1, the future of o1, and the future of AIFig 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/o1/img_014.pngFig 2: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/o1/img_016.pngFig 3: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/o1/img_018.pngFig 4: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/o1/img_020.pngFig 5: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/o1/img_024.pngFig 6: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/o1/img_026.pngFig 7: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/o1/img_034.pngFig 8: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/o1/img_048.png Get full access to Interconnects at www.interconnects.ai/subscribe
Scale AI's future versus further scaling of language model performance. How Nvidia may take all the margins from the data market, too.This is AI generated audio with Python and 11Labs.Source code: https://github.com/natolambert/interconnects-toolsOriginal post: https://www.interconnects.ai/p/ai-data-foundry00:00 Futures of the data foundry business model02:57 What it is like to work with data vendors06:06 Data foundries: Risks08:18 Data foundries: Growth vectors09:50 Realistic expectationsFig 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/data-foundry/img_008.pngFig 2: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/data-foundry/img_012.pngFig 3: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/data-foundry/img_023.png Get full access to Interconnects at www.interconnects.ai/subscribe
And why the concept of mandating "model spec's" could be a good start.(Oops, forgot to upload this yesterday!)This is AI generated audio with Python and 11Labs.Source code: https://github.com/natolambert/interconnects-toolsOriginal post:  https://www.interconnects.ai/p/a-post-training-approach-to-ai-regulation0:00 A post-training approach to AI regulation with Model Specs1:45 Expanded roles of Model Specifications3:40 Near future of Model Specifications Get full access to Interconnects at www.interconnects.ai/subscribe
Whether or not scaling works, we should spend more on inference.This is AI generated audio with Python and 11Labs.Source code: https://github.com/natolambert/interconnects-toolsOriginal post: https://www.interconnects.ai/p/openai-strawberry-and-inference-scaling-laws00:00 OpenAI's Strawberry, LM self-talk, inference scaling laws, and spending more on inference01:51 OpenAI's Strawberry04:16 Self-talk in language models07:45 Inference scaling lawsFig 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/strawberry/img_006.pngFig 2: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/strawberry/img_021.pngFig 3: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/strawberry/img_023.pngFig 4: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/strawberry/img_037.png Get full access to Interconnects at www.interconnects.ai/subscribe
Ai2 released OLMoE, which is probably our "best" model yet relative to its peers, but not much has changed in the process.This is AI generated audio with Python and 11Labs.Source code: https://github.com/natolambert/interconnects-toolsOriginal post: https://www.interconnects.ai/p/olmoe-and-building-better-llms00:00 OLMoE and the hidden simplicity in training better foundation models02:04 Frontier model team compute allocations04:19 De-risking training complexity06:40 On organizational complexity09:05 Compounding improvements -- the key to building better language modelsFig 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/olmoe/img_005.pngFig 2: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/olmoe/img_007.pngFig 3: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/olmoe/img_009.pngFig 4: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/olmoe/img_011.pngFig 5: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/olmoe/img_028.pngFig 6: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/olmoe/img_030.pngFig 7: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/olmoe/img_032.png Get full access to Interconnects at www.interconnects.ai/subscribe
The Open Source Initiative is working towards a definition.This is AI generated audio with Python and 11Labs.Source code: https://github.com/natolambert/interconnects-toolsOriginal post: https://www.interconnects.ai/p/defining-open-source-ai0:00 On the current definitions of open-source AI and the state of the data commons3:17 Reasons to not mandate fully released data4:24 Sufficient but not exhaustive data docs5:22 Frustration with the data commons7:04 We need more examples to define the definitionFig 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/defining-open-source/img_005.png Get full access to Interconnects at www.interconnects.ai/subscribe
The latest model from one of the most popular fine-tuning labs makes us question how a model should be identified as a "frontier model."This is AI generated audio with Python and 11Labs.Source code: https://github.com/natolambert/interconnects-toolsOriginal post: https://www.interconnects.ai/p/nous-hermes-30:00 Nous Hermes 3 and exploiting underspecified evaluations5:29 Parsing training lessons from Hermes 3Fig 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/nous-hermes-3/img_005.pngFig 2: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/nous-hermes-3/img_010.pngFig 3: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/nous-hermes-3/img_012.pngFig 4: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/nous-hermes-3/img_020.pngFig 5: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/nous-hermes-3/img_027.pngFig 6: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/nous-hermes-3/img_030.pngFig 7: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/nous-hermes-3/img_032.pngFig 8: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/nous-hermes-3/img_036.png Get full access to Interconnects at www.interconnects.ai/subscribe
I had the pleasure of Talking with Ross Taylor, who has a great spectrum of unique experiences in the language modeling space — evaluation experience, Galactica lead author, Llama post training, etc. This is a really great conversation on the frontier of language model (LM) reasoning, LM deployments and demos, LM’s for science, RLHF, and other topics. I’ve been trying to get Ross to come on for a bit. He’s one of those people in the LM space that doesn’t speak too much, but when you do, you listen.Ross Taylor was previously an LLM lead at Meta AI, heading up the reasoning team. Previously he led the early work on LLM agents, and was the research lead on the Galactica project. Before that, he was a co-founder of Papers with Code, which was acquired by Meta in 2019. Before that, he has worked as a quant in sports betting and finance, and before that a policy advisor for the UK Government. He is currently working on a new startup.Listen on Apple Podcasts, Spotify, and where ever you get your podcasts. For other Interconnects interviews, go here.YouTubeChapters* [00:00:00] Introduction of Ross Taylor and his background* [00:02:12] Papers with Code* [00:09:58] Galactica, goals, controversy, legacy* [00:18:12] Technical details of the Galactica model* [00:23:18] Potential for language models to make scientific discoveries* [00:25:21] Defining and improving reasoning in language models* [00:32:38] Process-based reward models and their potential applications* [00:35:00] Generating synthetic data for SFT* [00:40:23] Evaluating the effectiveness of language models as judges for human preference data* [00:42:43] Considerations for creating base models that are easy to fine-tune* [00:46:45] Balancing SFT and RLHF* [00:54:13] Characteristics of successful post-training teams* [00:58:26] Future directions for language model developmentWe mention* Galactica* Papers with Code* Rob Stojnic (co-founder of Papers with Code)* DPO, PPO* Armen Aghajanyan (Chameleon)* Tom Scialom on Latent Space* Soumith Chintala (PyTorch)* Alex Graves* Llama 3 paper* Process Reward Models / Let’s Verify Step by StepTranscriptBuilt with smol-podcaster and with love of Latent Space.Nathan Lambert [00:01:07]: Today, we're here with Ross. This is a really exciting one. I've been trying to get Ross on the show for a while. Ross has done a lot of interesting work. And also the path to where you ended up with working on state-of-the-art LLaMA work at Meta is very interesting to me. So we're going to start with some of that, but then there are a few people that want to know more about reasoning and some of the RLHF stuff. We won't cover the secretive new start-up - I don't know what it is, but that's how it goes these days. I'm sure it'll be great. So welcome to the show!Ross Taylor [00:01:41]: Thanks for having me.Nathan Lambert [00:01:44]: So I wanted to start with Papers with Code. For people that don't know, Papers with Code is one of these platforms - I never was a heavy user of it - but it collates papers, people can upvote them, popular papers, attaching code and dataset and evaluations to papers, which is great - it was like sort of ahead of its time. It fits into a lot of these open ecosystem things. So I'm kind of curious, like, how you ended up there and why you all started this startup that ended up building this thing that got acquired by Meta?Ross Taylor [00:02:12]: Yeah, that was a weird one. This was like back in 2018. So I was at an incubator, I just quit my previous job and I was like, okay, I want to do a startup. And I met Rob, my co-founder, who came along with me for the journey. We both came from different backgrounds. I was from a sports betting / quant finance kind of background, which is a whole other episode I guess. And Rob was in various startups, like applying ML to things like hate speech detection, that kind of stuff. And the cool thing was, we both resonated on similar kinds of problems within the ML space, even though we came from different domains. So we spent a lot of time doing various experiments, trying to make new kinds of ML tooling, thinking of these stupid questions like “what is the Git equivalent for ML?” - that kind of stuff. One of those experiments was hacking around on this little website to solve a really basic problem: I'm trying to reproduce this paper, but I can't find the code. That was the thing that really blew up beyond our expectations. It was weird because we thought it was fairly trivial at first.Nathan Lambert [00:03:16]: What year was this? 2018?Ross Taylor [00:03:18]: Yeah.Nathan Lambert [00:03:19]: This makes sense. I think this was like, I was starting Deep RL then, but Deep RL was so hot, which was like the worst evaluation has ever been probably for ML. Like people complain about it today, but like Deep RL evaluation was like, every single person was just lying to make themselves look better.Ross Taylor [00:03:38]: The interesting thing now is that the open ecosystem has shifted to focus more on weights as a central artifact rather than code. I think there's an interesting debate there. Would it be more useful to have the LLaMA-3 8B model weights or all the code for training LLaMA-3? I think there's still interesting debates to be had about what's actually useful.Nathan Lambert [00:03:56]: I think the code would be more useful. Like OpenAI released their rules-based reward models, but it's like code washing because it's like just a bunch of people just released like eval code now. And it's like, that's a whole another tier is like actual training code versus eval code. But yeah, I guess I'll just skip ahead.Ross Taylor [00:04:12]: So essentially Papers with Code was the thing that didn't die for us. We always thought we were going to make something else and Papers with Code was more of a marketing thing. But eventually we were like: okay, our users are telling us this is what we should be working on. And we expanded from that very simple use case of finding code towards indexing various artifacts in ML.Another big problem was trying to find the state of the art in something like ImageNet and all these different benchmarks. There just wasn't a central place to find this information…So we had this quite good Christmas - me and Robert - where we hacked for the whole month, indexing every leaderboard we could and all the related papers. I didn't want to do any annotation again after that! But that took things to the next tier, and that's when things really started to blow up.Nathan Lambert [00:05:03]: Because this is like the first round of leaderboards, because now it's really popular with Hugging Face again. And I was like, yeah, is that just because it became like a Meta thing and it's just kind of a thing that existed? You're like the first leaderboard company in a way, which I don't think many people think about. Yeah, which is weird.Ross Taylor [00:05:19]: Yeah. And the interesting thing about us was that we never had to do any marketing because everything was from organic traffic. So you would type in “state of the art ImageNet” and we would come to the top as the most useful site. That was really the source of our growth, and we grew to a million MAU fairly quickly. And as for Meta, we were in touch with the PyTorch folks at the time who we really liked. You know - Soumith, Joe - those folks, and they had a shared interest in promoting the open source ecosystem back in 2018/19. And while it was like a tough decision, we were just like “we really like working with these people, we want to work more closely with them”, and that got us into Meta.And then within Meta, we originally continued to develop the platform. But the big shift for us was that, even then, we saw we were moving to a world where compute was the currency. And we saw that, if we wanted to be well positioned in five years time, we needed to be building these large-scale systems. Even for our own platform, we had lots of ML in the backend and we saw we were using fewer and fewer models to do more and more tasks. So that kind of shifted us into research, into Galactica, and then eventually LLaMA and that kind of stuff.It was a weird shift because we were product people who ended up doing hardcore research! But I guess it was natural to us that we were within a research org with these amazing people, lots of resources. It was just the best use of our time to conduct this shift.Nathan Lambert [00:06:43]: Do you think there should have been more integration between Hugging Face and Papers with Code? It would have been wonderful if it had happened.Ross Taylor [00:06:54]: The backstory is that we saw them as competitors, to be honest, because we had the same vision originally. We were going to do model hosting, that kind of stuff. But we never got into it because we hit friction with leadership - who was not onboard with that as a goal. Because from their point of view, it's like, okay, if we host these things, this might expose Facebook to some kind of legal risk. It wasn't in the perceived interest of the company.Nathan Lambert [00:07:17]: This is a classic story of tech, really. They can't take the risk. They can't expose themselves.Ross Taylor [00:07:23]: If you're a startup and it's your number one priority, then yeah, your attitude on risk is different. But I think it was a blessing in disguise for us because clearly the bigger wave was going to be large language models - we saw that incredibly early. And our mission was fundamentally not infrastructure, but something closer to: how do you organize information? It was a Google-y type of mission. And while we were focused on ML, we were more broadly thinking about science: how do we reduce friction for finding out about new advances and, I guess, lots of small tasks that when added up lead to a lot of progress in science.Nathan Lambert [00:07:59]: I should have probably looked this up. Did you have another scientific background? Did you have a hard science background or wha
Apple, Meta, and Nvidia all agree -- synthetic data, iterative training, human preference labels, and lots of filtering.This is AI generated audio with Python and 11Labs.Source code: https://github.com/natolambert/interconnects-toolsOriginal post: https://www.interconnects.ai/p/frontier-model-post-training00:00 Llama 3.1 post-training and the new normal for RLHF01:18 A new standard pipeline01:45 Human preference data02:59 Scaling RLHF05:03 Synthetic data06:10 The new normal06:51 Data quality is king07:18 Apple confirms the new normalFig 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/frontier-rlhf/img_018.pngFig 2: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/frontier-rlhf/img_020.pngFig 3: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/frontier-rlhf/img_031.pngFig 4: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/frontier-rlhf/img_033.pngFig 5: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/frontier-rlhf/img_035.png Get full access to Interconnects at www.interconnects.ai/subscribe
This week, I had the pleasure of chatting with Sebastian Raschka. Sebastian is doing a ton of work on the open language model ecosystem and AI research broadly. He’s been writing the great Ahead of AI newsletter (that has the biggest audience overlap with Interconnects, at 26%, so a lot of you know him) and multiple educational books, all on top of being a full time machine learning engineer at Lightning.ai, where he maintains LitGPT, which he described as being like Karpathy’s NanoGPT, with slightly more abstractions.This conversation mostly surrounds keeping up with AI research, the state of the open LLM ecosystem post Llama 3.1, and many narrow topics in between. I learned that Sebastian used to be an Arxiv moderator, which gives some simple color on how Arxiv and sifting through thousands of papers works. We cover a lot of ground here, so I hope you enjoy it.Listen on Apple Podcasts, Spotify, and where ever you get your podcasts. For other interviews, go here.YouTubeChapters* [00:00:00] Introduction & Sebastian’s background* [00:04:28] The state of deep learning and language models in 2018* [00:08:02] Sebastian's work at Lightning AI and LitGPT* [00:12:23] Distillation and its potential in language model training* [00:14:14] Implementing language models and common pitfalls* [00:18:45] Modern architectures: Mixture of experts models, early v. late fusion multimodal* [00:24:23] Sebastian's book on building language models from scratch* [00:27:13] Comparing ChatGPT, Claude, and Google's Gemini for various tasks* [00:38:21] Vibing and checking new language models during implementation* [00:40:42] Selecting papers to read and moderating Arxiv* [00:45:36] Motivation for working on AI education* [00:52:46] Llama 3 fine-tuning* [00:57:26] The potential impact of AI on jobs in writing and education* [01:00:57] The future directions of AITranscriptBuilt with smol-podcaster and with love of Latent Space.Nathan Lambert [00:00:00]: Hey, Sebastian, welcome to this kind of interconnects, normally researcher interviews. You were a professor, so that definitely counts. You do a lot of different things these days. Let's get talking into language models. Welcome. Yeah.Sebastian Raschka [00:01:35]: Thanks so much for the invitation, Nathan. I'm a big fan actually of the interconnects newsletter, so I'm hoping we can have some fun chat about research, LLMs, and what's hot these days, basically. Yeah.Nathan Lambert [00:01:48]: I have a little section on the end, which is keeping up with AI research, writing about AI and process, because you do so many things, but I kind of want to jump into how you got to AI, because you have an interesting career path. So you were a professor at Wisconsin Madison for years. I saw in statistics, which ... I also went all the way back to find your PhD thesis, which was uncovering hidden patterns of molecular recognition. So this was a while ago, and is this kind of ... Can you explain your background and how you got into AI? I'm guessing it's through computational statistics or something like this.Sebastian Raschka [00:02:24]: Yeah. Close. So yeah, you did some research there. Interesting. So yeah, it's been a long time since my PhD thesis. This is maybe seven years now. And back then, it started even earlier when I got into AI, that was like, I would say 2012-ish. I was in grad school and I was taking a statistical pattern classification class. And in that class, yeah, the star of the show was basically naive Bayes classifiers, or in general, Bayesian methods for pattern recognition. And from there, I kind of really got into machine learning. So there was, I would say, more statistical-based, but it was all about classifying things. And then I think it was also right about the time where Cozera was launched, and I saw Andrew Ng's Cozera class. That was, I think, the first class in 2011-12 back then. And yeah, that's basically how I started from statistical pattern classification into machine learning. And I applied that for computational biology problems like molecule and drug discovery, like pharmaceutical drug discovery. And yeah, from there, I joined at some point after my graduation, the University of Wisconsin in Madison, where I was in the statistics department, but I did mostly deep learning research, essentially. I was the only one basically doing Python, deep learning, machine learning stuff. So yeah.Nathan Lambert [00:03:48]: What year was this, and what did it look like at the time?Sebastian Raschka [00:03:52]: That was around 2018, I think August 2018, when I joined the department. And yeah, I mean, so it's the statistics department, but my work was technically all machine learning and deep learning. I mean, a lot of students were really excited about learning machine learning. I think it was just around the time where it got really popular. And yeah, I was teaching machine learning and deep learning classes as well. They were always like, you know, full and crowded, like a lot of students were excited about that. Also, in general, like the time learning about Python, machine learning, data science, all these topics.Nathan Lambert [00:04:28]: It's, I mean, it's very interesting because I was a student, I was a grad student at this time or that time in like 2018. That's what deep RL was really taking off. And it probably feels like that probably felt kind of like the language model thing was like as a student at the time, where it's just like, there's so many people in all these classes. And now language models have more of a real world application, but I think as a student, it probably feels so, so similar. Yeah.Sebastian Raschka [00:04:50]: So also back then, if I may say that it's like large language models already existed. I think the GPT paper, was it 2018? Something like that?Nathan Lambert [00:04:59]: Yeah, 2018 or 2019. Yeah. For GPT-2, I think.Sebastian Raschka [00:05:04]: Remember covering, like I had a whole hour or two hours on large language models back then, but it was all focused on BERT models and basically also using them for more like classification tasks. Now, I would say maybe a lot of business problems still evolve around classification, but everything else is basically generative, generating text, generating images and stuff. So it has changed a lot.Nathan Lambert [00:05:28]: Yeah, for sure. It's like a sequence of like, is it like the transform, is it like Elmo, BERT and the transformers are probably the things that you're talking about all the time? Just very interesting. I think Yitay had this, did you read Yitay's recent blog posts on language model architectures and kind of walked through why encoder decoder is no longer in vogue? Did you see this?Sebastian Raschka [00:05:51]: Yeah, I think I haven't seen the article, but I remember having discussions with people about that recently. I mean, I think there was actually, it's interesting. So I think T5, if you would train it and fine tune it, it would still be a really good model for sequence to sequence tasks, like language translation and stuff like that.Nathan Lambert [00:06:10]: Yeah. Cohere for AI did this with AYA. They used T5 for their first AYA version, which most people were like, oh, they've Cohere branded it so well, but no one realized they're using T5.Sebastian Raschka [00:06:21]: See, I even didn't know about that. And so also on that note, I would say there was something else I wanted to say. So then there's also still the classification thing and using LLMs for classification. And it was also usually either a bird like encoder, or you could also use an encoder decoder, but mostly an encoder. But I've seen also recent papers using just decoder models for that. Just basically removing the, I saw two papers on that actually, like removing the causal mask. So basically reverting it back to an encoder using LLMA and then removing the mask. So in that sense.Nathan Lambert [00:06:59]: And it works well as a classifier. You can just kind of use it. That's awesome.Sebastian Raschka [00:07:04]: I mean, you could even do that without removing the causal mask. So you could just tune the last token basically, but yeah, if you remove it, yeah. They found that you could use probably the first token even, because if you have the last token, you don't, you have to have padding always because you have to pad it to the longest sequence. Otherwise the last token would be a different one in each training example. And so in this way you could use an earlier token basically, and then keep it fixed.Nathan Lambert [00:07:30]: Yeah. Yeah. Now with your work at Lightning AI, do you do a lot of these things like hacking around with language models? Because I think it's kind of an underexplored space where just like people remove layers and plug things together. I think there was like, when merging was just getting going, there was like Franken Llama 2, where somebody made like a Llama 2 30 B by just chopping layers and stuff together. There's so much unexplored signal there that I just, do you have your, have you ever looked at these things or you don't do that much?Sebastian Raschka [00:08:02]: I must say I'm not a big fan of merging. Maybe I'm just not good at it. I rather prefer fine tuning, start changing things or training and fine tuning things. So yeah, I do a lot of this type of hacking. Sometimes voluntarily, sometimes involuntarily, because I make a mistake or something or like, because at Lightning I developed this library, LitGPT, which is an open source library, pre-training, fine tuning and serving and deploying LLMs. But it's basically a from scratch implementation. You can think of it as a NanoGPT from Andrej Karpathy, but for all types of LLMs, like Llama, Gemma, PHY, all of them. But the focus is also like NanoGPT is on readable code or like keeping it relatively simple. Of course it gets a bit more complex there when you add multi-GPU training, tensor parallel, fully sharded data parallelism and stuf
And how to understand Llama three point one's results.This is AI generated audio with Python and 11Labs.Source code: https://github.com/natolambert/interconnects-toolsOriginal post: https://www.interconnects.ai/p/gpt-4o-mini-changed-chatbotarena0:00 GPT-4o-mini changed ChatBotArena3:23 Llama 3 in the arena5:13 Partial solutions and next stepsFig 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/new-chatbotarena/img_013.pngFig 2: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/new-chatbotarena/img_015.pngFig 3: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/new-chatbotarena/img_019.pngFig 4: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/new-chatbotarena/img_021.pngFig 5: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/new-chatbotarena/img_025.pngFig 6: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/new-chatbotarena/img_039.pngFig 7: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/new-chatbotarena/img_043.png Get full access to Interconnects at www.interconnects.ai/subscribe
Defining the future of the AI economy and regulation. Is Meta's AI play equivalent to the Unix stack for open-source software?This is AI generated audio with Python and 11Labs.Source code: https://github.com/natolambert/interconnects-toolsOriginal post: https://www.interconnects.ai/p/llama-405b-open-frontier-model00:00 Llama 3.1 405b, Meta's AI strategy, and the new open frontier model ecosystem01:37 Meta's open frontier model03:51 Zuckerberg's vision for open-source AI (vs. reality)08:35 Does the Llama 3.1 license support open-source AI?12:55 Different futures for regulating frontier modelsFig 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-405/img_008.pngFig 2: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-405/img_010.pngFig 3: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-405/img_015.pngFig 4: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-405/img_018.pngFig 5: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/llama-405/img_050.png Get full access to Interconnects at www.interconnects.ai/subscribe
loading