DiscoverLatent Space: The AI Engineer Podcast
Claim Ownership
Latent Space: The AI Engineer Podcast
Author: swyx + Alessio
Subscribed: 403Played: 9,533Subscribe
Share
© Latent.Space
Description
The podcast by and for AI Engineers! In 2023, over 1 million visitors came to Latent Space to hear about news, papers and interviews in Software 3.0.
We cover Foundation Models changing every domain in Code Generation, Multimodality, AI Agents, GPU Infra and more, directly from the founders, builders, and thinkers involved in pushing the cutting edge. Striving to give you both the definitive take on the Current Thing down to the first introduction to the tech you'll be using in the next 3 months! We break news and exclusive interviews from OpenAI, tiny (George Hotz), Databricks/MosaicML (Jon Frankle), Modular (Chris Lattner), Answer.ai (Jeremy Howard), et al.
Full show notes always on https://latent.space
www.latent.space
We cover Foundation Models changing every domain in Code Generation, Multimodality, AI Agents, GPU Infra and more, directly from the founders, builders, and thinkers involved in pushing the cutting edge. Striving to give you both the definitive take on the Current Thing down to the first introduction to the tech you'll be using in the next 3 months! We break news and exclusive interviews from OpenAI, tiny (George Hotz), Databricks/MosaicML (Jon Frankle), Modular (Chris Lattner), Answer.ai (Jeremy Howard), et al.
Full show notes always on https://latent.space
www.latent.space
112 Episodes
Reverse
One last Gold sponsor slot is available for the AI Engineer Summit in NYC. Our last round of invites is going out soon - apply here - If you are building AI agents or AI eng teams, this will be the single highest-signal conference of the year for you!While the world melts down over DeepSeek, few are talking about the OTHER notable group of former hedge fund traders who pivoted into AI and built a remarkably profitable consumer AI business with a tiny team with incredibly cracked engineering team — Chai Research. In short order they have:* Started a Chat AI company well before Noam Shazeer started Character AI, and outlasted his departure.* Crossed 1m DAU in 2.5 years - William updates us on the pod that they’ve hit 1.4m DAU now, another +40% from a few months ago. Revenue crossed >$22m. * Launched the Chaiverse model crowdsourcing platform - taking 3-4 week A/B testing cycles down to 3-4 hours, and deploying >100 models a week.While they’re not paying million dollar salaries, you can tell they’re doing pretty well for an 11 person startup:The Chai Recipe: Building infra for rapid evalsRemember how the central thesis of LMarena (formerly LMsys) is that the only comprehensive way to evaluate LLMs is to let users try them out and pick winners?At the core of Chai is a mobile app that looks like Character AI, but is actually the largest LLM A/B testing arena in the world, specialized on retaining chat users for Chai’s usecases (therapy, assistant, roleplay, etc). It’s basically what LMArena would be if taken very, very seriously at one company (with $1m in prizes to boot):Chai publishes occasional research on how they think about this, including talks at their Palo Alto office:William expands upon this in today’s podcast (34 mins in):Fundamentally, the way I would describe it is when you're building anything in life, you need to be able to evaluate it. And through evaluation, you can iterate, we can look at benchmarks, and we can say the issues with benchmarks and why they may not generalize as well as one would hope in the challenges of working with them. But something that works incredibly well is getting feedback from humans. And so we built this thing where anyone can submit a model to our developer backend, and it gets put in front of 5000 users, and the users can rate it. And we can then have a really accurate ranking of like which model, or users finding more engaging or more entertaining. And it gets, you know, it's at this point now, where every day we're able to, I mean, we evaluate between 20 and 50 models, LLMs, every single day, right. So even though we've got only got a team of, say, five AI researchers, they're able to iterate a huge quantity of LLMs, right. So our team ships, let's just say minimum 100 LLMs a week is what we're able to iterate through. Now, before that moment in time, we might iterate through three a week, we might, you know, there was a time when even doing like five a month was a challenge, right? By being able to change the feedback loops to the point where it's not, let's launch these three models, let's do an A-B test, let's assign, let's do different cohorts, let's wait 30 days to see what the day 30 retention is, which is the kind of the, if you're doing an app, that's like A-B testing 101 would be, do a 30-day retention test, assign different treatments to different cohorts and come back in 30 days. So that's insanely slow. That's just, it's too slow. And so we were able to get that 30-day feedback loop all the way down to something like three hours.In Crowdsourcing the leap to Ten Trillion-Parameter AGI, William describes Chai’s routing as a recommender system, which makes a lot more sense to us than previous pitches for model routing startups:William is notably counter-consensus in a lot of his AI product principles:* No streaming: Chats appear all at once to allow rejection sampling* No voice: Chai actually beat Character AI to introducing voice - but removed it after finding that it was far from a killer feature.* Blending: “Something that we love to do at Chai is blending, which is, you know, it's the simplest way to think about it is you're going to end up, and you're going to pretty quickly see you've got one model that's really smart, one model that's really funny. How do you get the user an experience that is both smart and funny? Well, just 50% of the requests, you can serve them the smart model, 50% of the requests, you serve them the funny model.” (that’s it!)But chief above all is the recommender system.We also referenced Exa CEO Will Bryk’s concept of SuperKnowlege:Full Video versionOn YouTube. please like and subscribe!Timestamps* 00:00:04 Introductions and background of William Beauchamp* 00:01:19 Origin story of Chai AI* 00:04:40 Transition from finance to AI* 00:11:36 Initial product development and idea maze for Chai* 00:16:29 User psychology and engagement with AI companions* 00:20:00 Origin of the Chai name* 00:22:01 Comparison with Character AI and funding challenges* 00:25:59 Chai's growth and user numbers* 00:34:53 Key inflection points in Chai's growth* 00:42:10 Multi-modality in AI companions and focus on user-generated content* 00:46:49 Chaiverse developer platform and model evaluation* 00:51:58 Views on AGI and the nature of AI intelligence* 00:57:14 Evaluation methods and human feedback in AI development* 01:02:01 Content creation and user experience in Chai* 01:04:49 Chai Grant program and company culture* 01:07:20 Inference optimization and compute costs* 01:09:37 Rejection sampling and reward models in AI generation* 01:11:48 Closing thoughts and recruitmentTranscriptAlessio [00:00:04]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO at Decibel, and today we're in the Chai AI office with my usual co-host, Swyx.swyx [00:00:14]: Hey, thanks for having us. It's rare that we get to get out of the office, so thanks for inviting us to your home. We're in the office of Chai with William Beauchamp. Yeah, that's right. You're founder of Chai AI, but previously, I think you're concurrently also running your fund?William [00:00:29]: Yep, so I was simultaneously running an algorithmic trading company, but I fortunately was able to kind of exit from that, I think just in Q3 last year. Yeah, congrats. Yeah, thanks.swyx [00:00:43]: So Chai has always been on my radar because, well, first of all, you do a lot of advertising, I guess, in the Bay Area, so it's working. Yep. And second of all, the reason I reached out to a mutual friend, Joyce, was because I'm just generally interested in the... ...consumer AI space, chat platforms in general. I think there's a lot of inference insights that we can get from that, as well as human psychology insights, kind of a weird blend of the two. And we also share a bit of a history as former finance people crossing over. I guess we can just kind of start it off with the origin story of Chai.William [00:01:19]: Why decide working on a consumer AI platform rather than B2B SaaS? So just quickly touching on the background in finance. Sure. Originally, I'm from... I'm from the UK, born in London. And I was fortunate enough to go study economics at Cambridge. And I graduated in 2012. And at that time, everyone in the UK and everyone on my course, HFT, quant trading was really the big thing. It was like the big wave that was happening. So there was a lot of opportunity in that space. And throughout college, I'd sort of played poker. So I'd, you know, I dabbled as a professional poker player. And I was able to accumulate this sort of, you know, say $100,000 through playing poker. And at the time, as my friends would go work at companies like ChangeStreet or Citadel, I kind of did the maths. And I just thought, well, maybe if I traded my own capital, I'd probably come out ahead. I'd make more money than just going to work at ChangeStreet.swyx [00:02:20]: With 100k base as capital?William [00:02:22]: Yes, yes. That's not a lot. Well, it depends what strategies you're doing. And, you know, there is an advantage. There's an advantage to being small, right? Because there are, if you have a 10... Strategies that don't work in size. Exactly, exactly. So if you have a fund of $10 million, if you find a little anomaly in the market that you might be able to make 100k a year from, that's a 1% return on your 10 million fund. If your fund is 100k, that's 100% return, right? So being small, in some sense, was an advantage. So started off, and the, taught myself Python, and machine learning was like the big thing as well. Machine learning had really, it was the first, you know, big time machine learning was being used for image recognition, neural networks come out, you get dropout. And, you know, so this, this was the big thing that's going on at the time. So I probably spent my first three years out of Cambridge, just building neural networks, building random forests to try and predict asset prices, right, and then trade that using my own money. And that went well. And, you know, if you if you start something, and it goes well, you You try and hire more people. And the first people that came to mind was the talented people I went to college with. And so I hired some friends. And that went well and hired some more. And eventually, I kind of ran out of friends to hire. And so that was when I formed the company. And from that point on, we had our ups and we had our downs. And that was a whole long story and journey in itself. But after doing that for about eight or nine years, on my 30th birthday, which was four years ago now, I kind of took a step back to just evaluate my life, right? This is what one does when one turns 30. You know, I just heard it. I hear you. And, you know, I looked at my 20s and I loved it. It was a really special time. I was really lucky and fortunate to have worked with this amazing team, been successful, had a lot of hard times. And through the hard times, learned wisdom and then a l
Sponsorships and applications for the AI Engineer Summit in NYC are live! (Speaker CFPs have closed) If you are building AI agents or leading teams of AI Engineers, this will be the single highest-signal conference of the year for you.Right after Christmas, the Chinese Whale Bros ended 2024 by dropping the last big model launch of the year: DeepSeek v3. Right now on LM Arena, DeepSeek v3 has a score of 1319, right under the full o1 model, Gemini 2, and 4o latest. This makes it the best open weights model in the world in January 2025.There has been a big recent trend in Chinese labs releasing very large open weights models, with TenCent releasing Hunyuan-Large in November and Hailuo releasing MiniMax-Text this week, both over 400B in size. However these extra-large language models are very difficult to serve.Baseten was the first of the Inference neocloud startups to get DeepSeek V3 online, because of their H200 clusters, their close collaboration with the DeepSeek team and early support of SGLang, a relatively new VLLM alternative that is also used at frontier labs like X.ai. Each H200 has 141 GB of VRAM with 4.8 TB per second of bandwidth, meaning that you can use 8 H200's in a node to inference DeepSeek v3 in FP8, taking into account KV Cache needs. We have been close to Baseten since Sarah Guo introduced Amir Haghighat to swyx, and they supported the very first Latent Space Demo Day in San Francisco, which was effectively the trial run for swyx and Alessio to work together! Since then, Philip Kiely also led a well attended workshop on TensorRT LLM at the 2024 World's Fair. We worked with him to get two of their best representatives, Amir and Lead Model Performance Engineer Yineng Zhang, to discuss DeepSeek, SGLang, and everything they have learned running Mission Critical Inference workloads at scale for some of the largest AI products in the world.The Three Pillars of Mission Critical InferenceWe initially planned to focus the conversation on SGLang, but Amir and Yineng were quick to correct us that the choice of inference framework is only the simplest, first choice of 3 things you need for production inference at scale:“I think it takes three things, and each of them individually is necessary but not sufficient: * Performance at the model level: how fast are you running this one model running on a single GPU, let's say. The framework that you use there can, can matter. The techniques that you use there can matter. The MLA technique, for example, that Yineng mentioned, or the CUDA kernels that are being used. But there's also techniques being used at a higher level, things like speculative decoding with draft models or with Medusa heads. And these are implemented in the different frameworks, or you can even implement it yourself, but they're not necessarily tied to a single framework. But using speculative decoding gets you massive upside when it comes to being able to handle high throughput. But that's not enough. Invariably, that one model running on a single GPU, let's say, is going to get too much traffic that it cannot handle.* Horizontal scaling at the cluster/region level: And at that point, you need to horizontally scale it. That's not an ML problem. That's not a PyTorch problem. That's an infrastructure problem. How quickly do you go from, a single replica of that model to 5, to 10, to 100. And so that's the second, that's the second pillar that is necessary for running these machine critical inference workloads.And what does it take to do that? It takes, some people are like, Oh, You just need Kubernetes and Kubernetes has an autoscaler and that just works. That doesn't work for, for these kinds of mission critical inference workloads. And you end up catching yourself wanting to bit by bit to rebuild those infrastructure pieces from scratch. This has been our experience. * And then going even a layer beyond that, Kubernetes runs in a single. cluster. It's a single cluster. It's a single region tied to a single region. And when it comes to inference workloads and needing GPUs more and more, you know, we're seeing this that you cannot meet the demand inside of a single region. A single cloud's a single region. In other words, a single model might want to horizontally scale up to 200 replicas, each of which is, let's say, 2H100s or 4H100s or even a full node, you run into limits of the capacity inside of that one region. And what we had to build to get around that was the ability to have a single model have replicas across different regions. So, you know, there are models on Baseten today that have 50 replicas in GCP East and, 80 replicas in AWS West and Oracle in London, etc.* Developer experience for Compound AI Systems: The final one is wrapping the power of the first two pillars in a very good developer experience to be able to afford certain workflows like the ones that I mentioned, around multi step, multi model inference workloads, because more and more we're seeing that the market is moving towards those that the needs are generally in these sort of more complex workflows. We think they said it very well.Show Notes* Amir Haghighat, Co-Founder, Baseten* Yineng Zhang, Lead Software Engineer, Model Performance, BasetenFull YouTube EpisodePlease like and subscribe!Timestamps* 00:00 Introduction and Latest AI Model Launch* 00:11 DeepSeek v3: Specifications and Achievements* 03:10 Latent Space Podcast: Special Guests Introduction* 04:12 DeepSeek v3: Technical Insights* 11:14 Quantization and Model Performance* 16:19 MOE Models: Trends and Challenges* 18:53 Baseten's Inference Service and Pricing* 31:13 Optimization for DeepSeek* 31:45 Three Pillars of Mission Critical Inference Workloads* 32:39 Scaling Beyond Single GPU* 33:09 Challenges with Kubernetes and Infrastructure* 33:40 Multi-Region Scaling Solutions* 35:34 SG Lang: A New Framework* 38:52 Key Techniques Behind SG Lang* 48:27 Speculative Decoding and Performance* 49:54 Future of Fine-Tuning and RLHF* 01:00:28 Baseten's V3 and Industry TrendsBaseten’s previous TensorRT LLM workshop: Get full access to Latent Space at www.latent.space/subscribe
Due to overwhelming demand (>15x applications:slots), we are closing CFPs for AI Engineer Summit NYC today. Last call! Thanks, we’ll be reaching out to all shortly!The world’s top AI blogger and friend of every pod, Simon Willison, dropped a monster 2024 recap: Things we learned about LLMs in 2024. Brian of the excellent TechMeme Ride Home pinged us for a connection and a special crossover episode, our first in 2025. The target audience for this podcast is a tech-literate, but non-technical one. You can see Simon’s notes for AI Engineers in his World’s Fair Keynote.Timestamp* 00:00 Introduction and Guest Welcome* 01:06 State of AI in 2025* 01:43 Advancements in AI Models* 03:59 Cost Efficiency in AI* 06:16 Challenges and Competition in AI* 17:15 AI Agents and Their Limitations* 26:12 Multimodal AI and Future Prospects* 35:29 Exploring Video Avatar Companies* 36:24 AI Influencers and Their Future* 37:12 Simplifying Content Creation with AI* 38:30 The Importance of Credibility in AI* 41:36 The Future of LLM User Interfaces* 48:58 Local LLMs: A Growing Interest* 01:07:22 AI Wearables: The Next Big Thing* 01:10:16 Wrapping Up and Final ThoughtsTranscript[00:00:00] Introduction and Guest Welcome[00:00:00] Brian: Welcome to the first bonus episode of the Tech Meme Write Home for the year 2025. I'm your host as always, Brian McCullough. Listeners to the pod over the last year know that I have made a habit of quoting from Simon Willison when new stuff happens in AI from his blog. Simon has been, become a go to for many folks in terms of, you know, Analyzing things, criticizing things in the AI space.[00:00:33] Brian: I've wanted to talk to you for a long time, Simon. So thank you for coming on the show. No, it's a privilege to be here. And the person that made this connection happen is our friend Swyx, who has been on the show back, even going back to the, the Twitter Spaces days but also an AI guru in, in their own right Swyx, thanks for coming on the show also.[00:00:54] swyx (2): Thanks. I'm happy to be on and have been a regular listener, so just happy to [00:01:00] contribute as well.[00:01:00] Brian: And a good friend of the pod, as they say. Alright, let's go right into it.[00:01:06] State of AI in 2025[00:01:06] Brian: Simon, I'm going to do the most unfair, broad question first, so let's get it out of the way. The year 2025. Broadly, what is the state of AI as we begin this year?[00:01:20] Brian: Whatever you want to say, I don't want to lead the witness.[00:01:22] Simon: Wow. So many things, right? I mean, the big thing is everything's got really good and fast and cheap. Like, that was the trend throughout all of 2024. The good models got so much cheaper, they got so much faster, they got multimodal, right? The image stuff isn't even a surprise anymore.[00:01:39] Simon: They're growing video, all of that kind of stuff. So that's all really exciting.[00:01:43] Advancements in AI Models[00:01:43] Simon: At the same time, they didn't get massively better than GPT 4, which was a bit of a surprise. So that's sort of one of the open questions is, are we going to see huge, but I kind of feel like that's a bit of a distraction because GPT 4, but way cheaper, much larger context lengths, and it [00:02:00] can do multimodal.[00:02:01] Simon: is better, right? That's a better model, even if it's not.[00:02:05] Brian: What people were expecting or hoping, maybe not expecting is not the right word, but hoping that we would see another step change, right? Right. From like GPT 2 to 3 to 4, we were expecting or hoping that maybe we were going to see the next evolution in that sort of, yeah.[00:02:21] Brian: We[00:02:21] Simon: did see that, but not in the way we expected. We thought the model was just going to get smarter, and instead we got. Massive drops in, drops in price. We got all of these new capabilities. You can talk to the things now, right? They can do simulated audio input, all of that kind of stuff. And so it's kind of, it's interesting to me that the models improved in all of these ways we weren't necessarily expecting.[00:02:43] Simon: I didn't know it would be able to do an impersonation of Santa Claus, like a, you know, Talked to it through my phone and show it what I was seeing by the end of 2024. But yeah, we didn't get that GPT 5 step. And that's one of the big open questions is, is that actually just around the corner and we'll have a bunch of GPT 5 class models drop in the [00:03:00] next few months?[00:03:00] Simon: Or is there a limit?[00:03:03] Brian: If you were a betting man and wanted to put money on it, do you expect to see a phase change, step change in 2025?[00:03:11] Simon: I don't particularly for that, like, the models, but smarter. I think all of the trends we're seeing right now are going to keep on going, especially the inference time compute, right?[00:03:21] Simon: The trick that O1 and O3 are doing, which means that you can solve harder problems, but they cost more and it churns away for longer. I think that's going to happen because that's already proven to work. I don't know. I don't know. Maybe there will be a step change to a GPT 5 level, but honestly, I'd be completely happy if we got what we've got right now.[00:03:41] Simon: But cheaper and faster and more capabilities and longer contexts and so forth. That would be thrilling to me.[00:03:46] Brian: Digging into what you've just said one of the things that, by the way, I hope to link in the show notes to Simon's year end post about what, what things we learned about LLMs in 2024. Look for that in the show notes.[00:03:59] Cost Efficiency in AI[00:03:59] Brian: One of the things that you [00:04:00] did say that you alluded to even right there was that in the last year, you felt like the GPT 4 barrier was broken, like IE. Other models, even open source ones are now regularly matching sort of the state of the art.[00:04:13] Simon: Well, it's interesting, right? So the GPT 4 barrier was a year ago, the best available model was OpenAI's GPT 4 and nobody else had even come close to it.[00:04:22] Simon: And they'd been at the, in the lead for like nine months, right? That thing came out in what, February, March of, of 2023. And for the rest of 2023, nobody else came close. And so at the start of last year, like a year ago, the big question was, Why has nobody beaten them yet? Like, what do they know that the rest of the industry doesn't know?[00:04:40] Simon: And today, that I've counted 18 organizations other than GPT 4 who've put out a model which clearly beats that GPT 4 from a year ago thing. Like, maybe they're not better than GPT 4. 0, but that's, that, that, that barrier got completely smashed. And yeah, a few of those I've run on my laptop, which is wild to me.[00:04:59] Simon: Like, [00:05:00] it was very, very wild. It felt very clear to me a year ago that if you want GPT 4, you need a rack of 40, 000 GPUs just to run the thing. And that turned out not to be true. Like the, the, this is that big trend from last year of the models getting more efficient, cheaper to run, just as capable with smaller weights and so forth.[00:05:20] Simon: And I ran another GPT 4 model on my laptop this morning, right? Microsoft 5. 4 just came out. And that, if you look at the benchmarks, it's definitely, it's up there with GPT 4. 0. It's probably not as good when you actually get into the vibes of the thing, but it, it runs on my, it's a 14 gigabyte download and I can run it on a MacBook Pro.[00:05:38] Simon: Like who saw that coming? The most exciting, like the close of the year on Christmas day, just a few weeks ago, was when DeepSeek dropped their DeepSeek v3 model on Hugging Face without even a readme file. It was just like a giant binary blob that I can't run on my laptop. It's too big. But in all of the benchmarks, it's now by far the best available [00:06:00] open, open weights model.[00:06:01] Simon: Like it's, it's, it's beating the, the metalamas and so forth. And that was trained for five and a half million dollars, which is a tenth of the price that people thought it costs to train these things. So everything's trending smaller and faster and more efficient.[00:06:15] Brian: Well, okay.[00:06:16] Challenges and Competition in AI[00:06:16] Brian: I, I kind of was going to get to that later, but let's, let's combine this with what I was going to ask you next, which is, you know, you're talking, you know, Also in the piece about the LLM prices crashing, which I've even seen in projects that I'm working on, but explain Explain that to a general audience, because we hear all the time that LLMs are eye wateringly expensive to run, but what we're suggesting, and we'll come back to the cheap Chinese LLM, but first of all, for the end user, what you're suggesting is that we're starting to see the cost come down sort of in the traditional technology way of Of costs coming down over time,[00:06:49] Simon: yes, but very aggressively.[00:06:51] Simon: I mean, my favorite thing, the example here is if you look at GPT-3, so open AI's g, PT three, which was the best, a developed model in [00:07:00] 2022 and through most of 20 2023. That, the models that we have today, the OpenAI models are a hundred times cheaper. So there was a 100x drop in price for OpenAI from their best available model, like two and a half years ago to today.[00:07:13] Simon: And[00:07:14] Brian: just to be clear, not to train the model, but for the use of tokens and things. Exactly,[00:07:20] Simon: for running prompts through them. And then When you look at the, the really, the top tier model providers right now, I think, are OpenAI, Anthropic, Google, and Meta. And there are a bunch of others that I could list there as well.[00:07:32] Simon: Mistral are very good. The, the DeepSeq and Quen models have got great. There's a whole bunch of providers serving really good models. But even if you just look at the sort of big brand name providers, they all offer models now that are
Applications close Monday for the NYC AI Engineer Summit focusing on AI Leadership and Agent Engineering! If you applied, invites should be rolling out shortly.The search landscape is experiencing a fundamental shift. Google built a >$2T company with the “10 blue links” experience, driven by PageRank as the core innovation for ranking. This was a big improvement from the previous directory-based experiences of AltaVista and Yahoo. Almost 4 decades later, Google is now stuck in this links-based experience, especially from a business model perspective. This legacy architecture creates fundamental constraints:* Must return results in ~400 milliseconds* Required to maintain comprehensive web coverage* Tied to keyword-based matching algorithms* Cost structures optimized for traditional indexingAs we move from the era of links to the era of answers, the way search works is changing. You’re not showing a user links, but the goal is to provide context to an LLM. This means moving from keyword based search to more semantic understanding of the content:The link prediction objective can be seen as like a neural PageRank because what you're doing is you're predicting the links people share... but it's more powerful than PageRank. It's strictly more powerful because people might refer to that Paul Graham fundraising essay in like a thousand different ways. And so our model learns all the different ways.All of this is now powered by a $5M cluster with 144 H200s:This architectural choice enables entirely new search capabilities:* Comprehensive result sets instead of approximations* Deep semantic understanding of queries* Ability to process complex, natural language requestsAs search becomes more complex, time to results becomes a variable:People think of searches as like, oh, it takes 500 milliseconds because we've been conditioned... But what if searches can take like a minute or 10 minutes or a whole day, what can you then do?Unlike traditional search engines' fixed-cost indexing, Exa employs a hybrid approach:* Front-loaded compute for indexing and embeddings* Variable inference costs based on query complexity* Mix of owned infrastructure ($5M H200 cluster) and cloud resourcesExa sees a lot of competition from products like Perplexity and ChatGPT Search which layer AI on top of traditional search backends, but Exa is betting that true innovation requires rethinking search from the ground up. For example, the recently launched Websets, a way to turn searches into structured output in grid format, allowing you to create lists and databases out of web pages. The company raised a $17M Series A to build towards this mission, so keep an eye out for them in 2025. Chapters* 00:00:00 Introductions* 00:01:12 ExaAI's initial pitch and concept* 00:02:33 Will's background at SpaceX and Zoox* 00:03:45 Evolution of ExaAI (formerly Metaphor Systems)* 00:05:38 Exa's link prediction technology* 00:09:20 Meaning of the name "Exa"* 00:10:36 ExaAI's new product launch and capabilities* 00:13:33 Compute budgets and variable compute products* 00:14:43 Websets as a B2B offering* 00:19:28 How do you build a search engine?* 00:22:43 What is Neural PageRank?* 00:27:58 Exa use cases * 00:35:00 Auto-prompting* 00:38:42 Building agentic search* 00:44:19 Is o1 on the path to AGI?* 00:49:59 Company culture and nap pods* 00:54:52 Economics of AI search and the future of search technologyFull YouTube TranscriptPlease like and subscribe!Show Notes* ExaAI* Web Search Product* Websets* Series A Announcement* Exa Nap Pods* Perplexity AI* Character.AITranscriptAlessio [00:00:00]: Hey, everyone. Welcome to the Latent Space podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol.ai.Swyx [00:00:10]: Hey, and today we're in the studio with my good friend and former landlord, Will Bryk. Roommate. How you doing? Will, you're now CEO co-founder of ExaAI, used to be Metaphor Systems. What's your background, your story?Will [00:00:30]: Yeah, sure. So, yeah, I'm CEO of Exa. I've been doing it for three years. I guess I've always been interested in search, whether I knew it or not. Like, since I was a kid, I've always been interested in, like, high-quality information. And, like, you know, even in high school, wanted to improve the way we get information from news. And then in college, built a mini search engine. And then with Exa, like, you know, it's kind of like fulfilling the dream of actually being able to solve all the information needs I wanted as a kid. Yeah, I guess. I would say my entire life has kind of been rotating around this problem, which is pretty cool. Yeah.Swyx [00:00:50]: What'd you enter YC with?Will [00:00:53]: We entered YC with, uh, we are better than Google. Like, Google 2.0.Swyx [00:01:12]: What makes you say that? Like, that's so audacious to come out of the box with.Will [00:01:16]: Yeah, okay, so you have to remember the time. This was summer 2021. And, uh, GPT-3 had come out. Like, here was this magical thing that you could talk to, you could enter a whole paragraph, and it understands what you mean, understands the subtlety of your language. And then there was Google. Uh, which felt like it hadn't changed in a decade, uh, because it really hadn't. And it, like, you would give it a simple query, like, I don't know, uh, shirts without stripes, and it would give you a bunch of results for the shirts with stripes. And so, like, Google could barely understand you, and GBD3 could. And the theory was, what if you could make a search engine that actually understood you? What if you could apply the insights from LLMs to a search engine? And it's really been the same idea ever since. And we're actually a lot closer now, uh, to doing that. Yeah.Alessio [00:01:55]: Did you have any trouble making people believe? Obviously, there's the same element. I mean, YC overlap, was YC pretty AI forward, even 2021, or?Will [00:02:03]: It's nothing like it is today. But, um, uh, there were a few AI companies, but, uh, we were definitely, like, bold. And I think people, VCs generally like boldness, and we definitely had some AI background, and we had a working demo. So there was evidence that we could build something that was going to work. But yeah, I think, like, the fundamentals were there. I think people at the time were talking about how, you know, Google was failing in a lot of ways. And so there was a bit of conversation about it, but AI was not a big, big thing at the time. Yeah. Yeah.Alessio [00:02:33]: Before we jump into Exa, any fun background stories? I know you interned at SpaceX, any Elon, uh, stories? I know you were at Zoox as well, you know, kind of like robotics at Harvard. Any stuff that you saw early that you thought was going to get solved that maybe it's not solved today?Will [00:02:48]: Oh yeah. I mean, lots of things like that. Like, uh, I never really learned how to drive because I believed Elon that self-driving cars would happen. It did happen. And I take them every night to get home. But it took like 10 more years than I thought. Do you still not know how to drive? I know how to drive now. I learned it like two years ago. That would have been great to like, just, you know, Yeah, yeah, yeah. You know? Um, I was obsessed with Elon. Yeah. I mean, I worked at SpaceX because I really just wanted to work at one of his companies. And I remember they had a rule, like interns cannot touch Elon. And, um, that rule actually influenced my actions.Swyx [00:03:18]: Is it, can Elon touch interns? Ooh, like physically?Will [00:03:22]: Or like talk? Physically, physically, yeah, yeah, yeah, yeah. Okay, interesting. He's changed a lot, but, um, I mean, his companies are amazing. Um,Swyx [00:03:28]: What if you beat him at Diablo 2, Diablo 4, you know, like, Ah, maybe.Alessio [00:03:34]: I want to jump into, I know there's a lot of backstory used to be called metaphor system. So, um, and it, you've always been kind of like a prominent company, maybe at least RAI circles in the NSF.Swyx [00:03:45]: I'm actually curious how Metaphor got its initial aura. You launched with like, very little. We launched very little. Like there was, there was this like big splash image of like, this is Aurora or something. Yeah. Right. And then I was like, okay, what this thing, like the vibes are good, but I don't know what it is. And I think, I think it was much more sort of maybe consumer facing than what you are today. Would you say that's true?Will [00:04:06]: No, it's always been about building a better search algorithm, like search, like, just like the vision has always been perfect search. And if you do that, uh, we will figure out the downstream use cases later. It started on this fundamental belief that you could have perfect search over the web and we could talk about what that means. And like the initial thing we released was really just like our first search engine, like trying to get it out there. Kind of like, you know, an open source. So when OpenAI released, uh, ChachBt, like they didn't, I don't know how, how much of a game plan they had. They kind of just wanted to get something out there.Swyx [00:04:33]: Spooky research preview.Will [00:04:34]: Yeah, exactly. And it kind of morphed from a research company to a product company at that point. And I think similarly for us, like we were research, we started as a research endeavor with a, you know, clear eyes that like, if we succeed, it will be a massive business to make out of it. And that's kind of basically what happened. I think there are actually a lot of parallels to, of w between Exa and OpenAI. I often say we're the OpenAI of search. Um, because. Because we're a research company, we're a research startup that does like fundamental research into, uh, making like AGI for search in a, in a way. Uh, and then we have all these like, uh, business products that come out of that.Swyx [00:05:08]: Interesting. I want to ask a little bit more about Metaf
Applications for the NYC AI Engineer Summit, focused on Agents at Work, are open!When we first started Latent Space, in the lightning round we’d always ask guests: “What’s your favorite AI product?”. The majority would say Midjourney. The simple UI of prompt → very aesthetic image turned it into a $300M+ ARR bootstrapped business as it rode the first wave of AI image generation.In open source land, StableDiffusion was congregating around AUTOMATIC1111 as the de-facto web UI. Unlike Midjourney, which offered some flags but was mostly prompt-driven, A1111 let users play with a lot more parameters, supported additional modalities like img2img, and allowed users to load in custom models. If you’re interested in some of the SD history, you can look at our episodes with Lexica, Replicate, and Playground.One of the people involved with that community was comfyanonymous, who was also part of the Stability team in 2023, decided to build an alternative called ComfyUI, now one of the fastest growing open source projects in generative images, and is now the preferred partner for folks like Black Forest Labs’s Flux Tools on Day 1. The idea behind it was simple: “Everyone is trying to make easy to use interfaces. Let me try to make a powerful interface that's not easy to use.”Unlike its predecessors, ComfyUI does not have an input text box. Everything is based around the idea of a node: there’s a text input node, a CLIP node, a checkpoint loader node, a KSampler node, a VAE node, etc. While daunting for simple image generation, the tool is amazing for more complex workflows since you can break down every step of the process, and then chain many of them together rather than manually switching between tools. You can also re-start execution halfway instead of from the beginning, which can save a lot of time when using larger models.To give you an idea of some of the new use cases that this type of UI enables:* Sketch something → Generate an image with SD from sketch → feed it into SD Video to animate* Generate an image of an object → Turn into a 3D asset → Feed into interactive experiences* Input audio → Generate audio-reactive videosTheir Examples page also includes some of the more common use cases like AnimateDiff, etc. They recently launched the Comfy Registry, an online library of different nodes that users can pull from rather than having to build everything from scratch. The project has >60,000 Github stars, and as the community grows, some of the projects that people build have gotten quite complex:The most interesting thing about Comfy is that it’s not a UI, it’s a runtime. You can build full applications on top of image models simply by using Comfy. You can expose Comfy workflows as an endpoint and chain them together just like you chain a single node. We’re seeing the rise of AI Engineering applied to art.Major Tom’s ComfyUI Resources from the Latent Space DiscordMajor shoutouts to Major Tom on the LS Discord who is a image generation expert, who offered these pointers:* “best thing about comfy is the fact it supports almost immediately every new thing that comes out - unlike A1111 or forge, which still don't support flux cnet for instance. It will be perfect tool when conflicting nodes will be resolved”* AP Workflows from Alessandro Perili are a nice example of an all-in-one train-evaluate-generate system built atop Comfy* ComfyUI YouTubers to learn from:* @sebastiankamph* @NerdyRodent* @OlivioSarikas* @sedetweiler* @pixaroma* ComfyUI Nodes to check out:* https://github.com/kijai/ComfyUI-IC-Light* https://github.com/MrForExample/ComfyUI-3D-Pack* https://github.com/PowerHouseMan/ComfyUI-AdvancedLivePortrait* https://github.com/pydn/ComfyUI-to-Python-Extension* https://github.com/THtianhao/ComfyUI-Portrait-Maker* https://github.com/ssitu/ComfyUI_NestedNodeBuilder* https://github.com/longgui0318/comfyui-magic-clothing* https://github.com/atmaranto/ComfyUI-SaveAsScript* https://github.com/ZHO-ZHO-ZHO/ComfyUI-InstantID* https://github.com/AIFSH/ComfyUI-FishSpeech* https://github.com/coolzilj/ComfyUI-Photopea* https://github.com/lks-ai/anynode* Sarav: https://www.youtube.com/@mickmumpitz/videos ( applied stuff )* Sarav: https://www.youtube.com/@latentvision (technical, but infrequent)* look for comfyui node for https://github.com/magic-quill/MagicQuill* “Comfy for Video” resources* Kijai (https://github.com/kijai) pushing out support for Mochi, CogVideoX, AnimateDif, LivePortrait etc* Comfyui node support like LTX https://github.com/Lightricks/ComfyUI-LTXVideo , and HunyuanVideo* FloraFauna AI and Krea.ai* Communities: https://www.reddit.com/r/StableDiffusion/, https://www.reddit.com/r/comfyui/Full YouTube EpisodeAs usual, you can find the full video episode on our YouTube (and don’t forget to like and subscribe!)Timestamps* 00:00:04 Introduction of hosts and anonymous guest* 00:00:35 Origins of Comfy UI and early Stable Diffusion landscape* 00:02:58 Comfy's background and development of high-res fix* 00:05:37 Area conditioning and compositing in image generation* 00:07:20 Discussion on different AI image models (SD, Flux, etc.)* 00:11:10 Closed source model APIs and community discussions on SD versions* 00:14:41 LoRAs and textual inversion in image generation* 00:18:43 Evaluation methods in the Comfy community* 00:20:05 CLIP models and text encoders in image generation* 00:23:05 Prompt weighting and negative prompting* 00:26:22 Comfy UI's unique features and design choices* 00:31:00 Memory management in Comfy UI* 00:33:50 GPU market share and compatibility issues* 00:35:40 Node design and parameter settings in Comfy UI* 00:38:44 Custom nodes and community contributions* 00:41:40 Video generation models and capabilities* 00:44:47 Comfy UI's development timeline and rise to popularity* 00:48:13 Current state of Comfy UI team and future plans* 00:50:11 Discussion on other Comfy startups and potential text generation supportTranscriptAlessio [00:00:04]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co-host Swyx, founder of Small AI.swyx [00:00:12]: Hey everyone, we are in the Chroma Studio again, but with our first ever anonymous guest, Comfy Anonymous, welcome.Comfy [00:00:19]: Hello.swyx [00:00:21]: I feel like that's your full name, you just go by Comfy, right?Comfy [00:00:24]: Yeah, well, a lot of people just call me Comfy, even when they know my real name. Hey, Comfy.Alessio [00:00:32]: Swyx is the same. You know, not a lot of people call you Shawn.swyx [00:00:35]: Yeah, you have a professional name, right, that people know you by, and then you have a legal name. Yeah, it's fine. How do I phrase this? I think people who are in the know, know that Comfy is like the tool for image generation and now other multimodality stuff. I would say that when I first got started with Stable Diffusion, the star of the show was Automatic 111, right? And I actually looked back at my notes from 2022-ish, like Comfy was already getting started back then, but it was kind of like the up and comer, and your main feature was the flowchart. Can you just kind of rewind to that moment, that year and like, you know, how you looked at the landscape there and decided to start Comfy?Comfy [00:01:10]: Yeah, I discovered Stable Diffusion in 2022, in October 2022. And, well, I kind of started playing around with it. Yes, I, and back then I was using Automatic, which was what everyone was using back then. And so I started with that because I had, it was when I started, I had no idea like how Diffusion works. I didn't know how Diffusion models work, how any of this works, so.swyx [00:01:36]: Oh, yeah. What was your prior background as an engineer?Comfy [00:01:39]: Just a software engineer. Yeah. Boring software engineer.swyx [00:01:44]: But like any, any image stuff, any orchestration, distributed systems, GPUs?Comfy [00:01:49]: No, I was doing basically nothing interesting. Crud, web development? Yeah, a lot of web development, just, yeah, some basic, maybe some basic like automation stuff. Okay. Just. Yeah, no, like, no big companies or anything.swyx [00:02:08]: Yeah, but like already some interest in automations, probably a lot of Python.Comfy [00:02:12]: Yeah, yeah, of course, Python. But I wasn't actually used to like the Node graph interface before I started Comfy UI. It was just, I just thought it was like, oh, like, what's the best way to represent the Diffusion process in the user interface? And then like, oh, well. Well, like, naturally, oh, this is the best way I've found. And this was like with the Node interface. So how I got started was, yeah, so basic October 2022, just like I hadn't written a line of PyTorch before that. So it's completely new. What happened was I kind of got addicted to generating images.Alessio [00:02:58]: As we all did. Yeah.Comfy [00:03:00]: And then I started. I started experimenting with like the high-res fixed in auto, which was for those that don't know, the high-res fix is just since the Diffusion models back then could only generate that low-resolution. So what you would do, you would generate low-resolution image, then upscale, then refine it again. And that was kind of the hack to generate high-resolution images. I really liked generating. Like higher resolution images. So I was experimenting with that. And so I modified the code a bit. Okay. What happens if I, if I use different samplers on the second pass, I was edited the code of auto. So what happens if I use a different sampler? What happens if I use a different, like a different settings, different number of steps? And because back then the. The high-res fix was very basic, just, so. Yeah.swyx [00:04:05]: Now there's a whole library of just, uh, the upsamplers.Comfy [00:04:08]: I think, I think they added a bunch of, uh, of options to the high-res fix since, uh, since, since then. But before that was just so basic. So I wanted to go further. I wanted to try it. What happens
Applications for the 2025 AI Engineer Summit are up, and you can save the date for AIE Singapore in April and AIE World’s Fair 2025 in June.Happy new year, and thanks for 100 great episodes! Please let us know what you want to see/hear for the next 100!Full YouTube Episode with Slides/ChartsLike and subscribe and hit that bell to get notifs!Timestamps* 00:00 Welcome to the 100th Episode!* 00:19 Reflecting on the Journey* 00:47 AI Engineering: The Rise and Impact* 03:15 Latent Space Live and AI Conferences* 09:44 The Competitive AI Landscape* 21:45 Synthetic Data and Future Trends* 35:53 Creative Writing with AI* 36:12 Legal and Ethical Issues in AI* 38:18 The Data War: GPU Poor vs. GPU Rich* 39:12 The Rise of GPU Ultra Rich* 40:47 Emerging Trends in AI Models* 45:31 The Multi-Modality War* 01:05:31 The Future of AI Benchmarks* 01:13:17 Pionote and Frontier Models* 01:13:47 Niche Models and Base Models* 01:14:30 State Space Models and RWKB* 01:15:48 Inference Race and Price Wars* 01:22:16 Major AI Themes of the Year* 01:22:48 AI Rewind: January to March* 01:26:42 AI Rewind: April to June* 01:33:12 AI Rewind: July to September* 01:34:59 AI Rewind: October to December* 01:39:53 Year-End Reflections and PredictionsTranscript[00:00:00] Welcome to the 100th Episode![00:00:00] Alessio: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co host Swyx for the 100th time today.[00:00:12] swyx: Yay, um, and we're so glad that, yeah, you know, everyone has, uh, followed us in this journey. How do you feel about it? 100 episodes.[00:00:19] Alessio: Yeah, I know.[00:00:19] Reflecting on the Journey[00:00:19] Alessio: Almost two years that we've been doing this. We've had four different studios. Uh, we've had a lot of changes. You know, we used to do this lightning round. When we first started that we didn't like, and we tried to change the question. The answer[00:00:32] swyx: was cursor and perplexity.[00:00:34] Alessio: Yeah, I love mid journey. It's like, do you really not like anything else?[00:00:38] Alessio: Like what's, what's the unique thing? And I think, yeah, we, we've also had a lot more research driven content. You know, we had like 3DAO, we had, you know. Jeremy Howard, we had more folks like that.[00:00:47] AI Engineering: The Rise and Impact[00:00:47] Alessio: I think we want to do more of that too in the new year, like having, uh, some of the Gemini folks, both on the research and the applied side.[00:00:54] Alessio: Yeah, but it's been a ton of fun. I think we both started, I wouldn't say as a joke, we were kind of like, Oh, we [00:01:00] should do a podcast. And I think we kind of caught the right wave, obviously. And I think your rise of the AI engineer posts just kind of get people. Sombra to congregate, and then the AI engineer summit.[00:01:11] Alessio: And that's why when I look at our growth chart, it's kind of like a proxy for like the AI engineering industry as a whole, which is almost like, like, even if we don't do that much, we keep growing just because there's so many more AI engineers. So did you expect that growth or did you expect that would take longer for like the AI engineer thing to kind of like become, you know, everybody talks about it today.[00:01:32] swyx: So, the sign of that, that we have won is that Gartner puts it at the top of the hype curve right now. So Gartner has called the peak in AI engineering. I did not expect, um, to what level. I knew that I was correct when I called it because I did like two months of work going into that. But I didn't know, You know, how quickly it could happen, and obviously there's a chance that I could be wrong.[00:01:52] swyx: But I think, like, most people have come around to that concept. Hacker News hates it, which is a good sign. But there's enough people that have defined it, you know, GitHub, when [00:02:00] they launched GitHub Models, which is the Hugging Face clone, they put AI engineers in the banner, like, above the fold, like, in big So I think it's like kind of arrived as a meaningful and useful definition.[00:02:12] swyx: I think people are trying to figure out where the boundaries are. I think that was a lot of the quote unquote drama that happens behind the scenes at the World's Fair in June. Because I think there's a lot of doubt or questions about where ML engineering stops and AI engineering starts. That's a useful debate to be had.[00:02:29] swyx: In some sense, I actually anticipated that as well. So I intentionally did not. Put a firm definition there because most of the successful definitions are necessarily underspecified and it's actually useful to have different perspectives and you don't have to specify everything from the outset.[00:02:45] Alessio: Yeah, I was at um, AWS reInvent and the line to get into like the AI engineering talk, so to speak, which is, you know, applied AI and whatnot was like, there are like hundreds of people just in line to go in.[00:02:56] Alessio: I think that's kind of what enabled me. People, right? Which is what [00:03:00] you kind of talked about. It's like, Hey, look, you don't actually need a PhD, just, yeah, just use the model. And then maybe we'll talk about some of the blind spots that you get as an engineer with the earlier posts that we also had on on the sub stack.[00:03:11] Alessio: But yeah, it's been a heck of a heck of a two years.[00:03:14] swyx: Yeah.[00:03:15] Latent Space Live and AI Conferences[00:03:15] swyx: You know, I was, I was trying to view the conference as like, so NeurIPS is I think like 16, 17, 000 people. And the Latent Space Live event that we held there was 950 signups. I think. The AI world, the ML world is still very much research heavy. And that's as it should be because ML is very much in a research phase.[00:03:34] swyx: But as we move this entire field into production, I think that ratio inverts into becoming more engineering heavy. So at least I think engineering should be on the same level, even if it's never as prestigious, like it'll always be low status because at the end of the day, you're manipulating APIs or whatever.[00:03:51] swyx: But Yeah, wrapping GPTs, but there's going to be an increasing stack and an art to doing these, these things well. And I, you know, I [00:04:00] think that's what we're focusing on for the podcast, the conference and basically everything I do seems to make sense. And I think we'll, we'll talk about the trends here that apply.[00:04:09] swyx: It's, it's just very strange. So, like, there's a mix of, like, keeping on top of research while not being a researcher and then putting that research into production. So, like, people always ask me, like, why are you covering Neuralibs? Like, this is a ML research conference and I'm like, well, yeah, I mean, we're not going to, to like, understand everything Or reproduce every single paper, but the stuff that is being found here is going to make it through into production at some point, you hope.[00:04:32] swyx: And then actually like when I talk to the researchers, they actually get very excited because they're like, oh, you guys are actually caring about how this goes into production and that's what they really really want. The measure of success is previously just peer review, right? Getting 7s and 8s on their um, Academic review conferences and stuff like citations is one metric, but money is a better metric.[00:04:51] Alessio: Money is a better metric. Yeah, and there were about 2200 people on the live stream or something like that. Yeah, yeah. Hundred on the live stream. So [00:05:00] I try my best to moderate, but it was a lot spicier in person with Jonathan and, and Dylan. Yeah, that it was in the chat on YouTube.[00:05:06] swyx: I would say that I actually also created.[00:05:09] swyx: Layen Space Live in order to address flaws that are perceived in academic conferences. This is not NeurIPS specific, it's ICML, NeurIPS. Basically, it's very sort of oriented towards the PhD student, uh, market, job market, right? Like literally all, basically everyone's there to advertise their research and skills and get jobs.[00:05:28] swyx: And then obviously all the, the companies go there to hire them. And I think that's great for the individual researchers, but for people going there to get info is not great because you have to read between the lines, bring a ton of context in order to understand every single paper. So what is missing is effectively what I ended up doing, which is domain by domain, go through and recap the best of the year.[00:05:48] swyx: Survey the field. And there are, like NeurIPS had a, uh, I think ICML had a like a position paper track, NeurIPS added a benchmarks, uh, datasets track. These are ways in which to address that [00:06:00] issue. Uh, there's always workshops as well. Every, every conference has, you know, a last day of workshops and stuff that provide more of an overview.[00:06:06] swyx: But they're not specifically prompted to do so. And I think really, uh, Organizing a conference is just about getting good speakers and giving them the correct prompts. And then they will just go and do that thing and they do a very good job of it. So I think Sarah did a fantastic job with the startups prompt.[00:06:21] swyx: I can't list everybody, but we did best of 2024 in startups, vision, open models. Post transformers, synthetic data, small models, and agents. And then the last one was the, uh, and then we also did a quick one on reasoning with Nathan Lambert. And then the last one, obviously, was the debate that people were very hyped about.[00:06:39] swyx: It was very awkward. And I'm really, really thankful for John Franco, basically, who stepped up to challenge Dylan. Because Dylan was like, yeah, I'll do it. But He was pro scaling. And I think everyone who is like in AI is pro scaling, right? So you need somebody who's ready to publicly say, no, we've hit a wall.[00:06:57] swyx:
Happy holidays! We’ll be sharing snippets from Latent Space LIVE! through the break bringing you the best of 2024! We want to express our deepest appreciation to event sponsors AWS, Daylight Computer, Thoth.ai, StrongCompute, Notable Capital, and most of all all our LS supporters who helped fund the gorgeous venue and A/V production!For NeurIPS last year we did our standard conference podcast coverage interviewing selected papers (that we have now also done for ICLR and ICML), however we felt that we could be doing more to help AI Engineers 1) get more industry-relevant content, and 2) recap 2024 year in review from experts. As a result, we organized the first Latent Space LIVE!, our first in person miniconference, at NeurIPS 2024 in Vancouver.Our next keynote covers The State of LLM Agents, with the triumphant return of Professor Graham Neubig’s return to the pod (his ICLR episode here!). OpenDevin is now a startup known as AllHands! The renamed OpenHands has done extremely well this year, as they end the year sitting comfortably at number 1 on the hardest SWE-Bench Full leaderboard at 29%, though on the smaller SWE-Bench Verified, they are at 53%, behind Amazon Q, devlo, and OpenAI's self reported o3 results at 71.7%.Many are saying that 2025 is going to be the year of agents, with OpenAI, DeepMind and Anthropic setting their sights on consumer and coding agents, vision based computer-using agents and multi agent systems. There has been so much progress on the practical reliability and applications of agents in all domains, from the huge launch of Cognition AI's Devin this year, to the sleeper hit of Cursor Composer and Codeium's Windsurf Cascade in the IDE arena, to the explosive revenue growth of Stackblitz's Bolt, Lovable, and Vercel's v0, and the unicorn rounds and high profile movements of customer support agents like Sierra (now worth $4 billion) and search agents like Perplexity (now worth $9 billion). We wanted to take a little step back to understand the most notable papers of the year in Agents, and Graham indulged with his list of 8 perennial problems in building agents in 2024.Must-Read Papers for the 8 Problems of Agents* The agent-computer interface: CodeAct: Executable Code Actions Elicit Better LLM Agents. Minimial viable tools: Execution Sandbox, File Editor, Web Browsing* The human-agent interface: Chat UI, GitHub Plugin, Remote runtime, …?* Choosing an LLM: See Evaluation of LLMs as Coding Agents on SWE-Bench at 30x - must understand instructions, tools, code, environment, error recovery* Planning: Single Agent Systems vs Multi Agent (CoAct: A Global-Local Hierarchy for Autonomous Agent Collaboration) - Explicit vs Implicit, Curated vs Generated* Reusable common workflows: SteP: Stacked LLM Policies for Web Actions and Agent Workflow Memory - Manual prompting vs Learning from Experience* Exploration: Agentless: Demystifying LLM-based Software Engineering Agents and BAGEL: Bootstrapping Agents by Guiding Exploration with Language* Search: Tree Search for Language Model Agents - explore paths and rewind* Evaluation: Fast Sanity Checks (miniWoB and Aider) and Highly Realistic (WebArena, SWE-Bench) and SWE-Gym: An Open Environment for Training Software Engineering Agents & VerifiersFull Talk on YouTubePlease like and subscribe!Timestamps* 00:00 Welcome to Latent Space Live at NeurIPS 2024* 00:29 State of LLM Agents in 2024* 02:20 Professor Graham Newbig's Insights on Agents* 03:57 Live Demo: Coding Agents in Action* 08:20 Designing Effective Agents* 14:13 Choosing the Right Language Model for Agents* 16:24 Planning and Workflow for Agents* 22:21 Evaluation and Future Predictions for Agents* 25:31 Future of Agent Development* 25:56 Human-Agent Interaction Challenges* 26:48 Expanding Agent Use Beyond Programming* 27:25 Redesigning Systems for Agent Efficiency* 28:03 Accelerating Progress with Agent Technology* 28:28 Call to Action for Open Source Contributions* 30:36 Q&A: Agent Performance and Benchmarks* 33:23 Q&A: Web Agents and Interaction Methods* 37:16 Q&A: Agent Architectures and Improvements* 43:09 Q&A: Self-Improving Agents and Authentication* 47:31 Live Demonstration and Closing RemarksTranscript[00:00:29] State of LLM Agents in 2024[00:00:29] Speaker 9: Our next keynote covers the state of LLM agents. With the triumphant return of Professor Graham Newbig of CMU and OpenDevon, now a startup known as AllHands. The renamed OpenHands has done extremely well this year, as they end the year sitting comfortably at number one on the hardest SWE Benchful leaderboard at 29%.[00:00:53] Speaker 9: Though, on the smaller SWE bench verified, they are at 53 percent behind Amazon Q [00:01:00] Devlo and OpenAI's self reported O3 results at 71. 7%. Many are saying that 2025 is going to be the year of agents, with OpenAI, DeepMind, and Anthropic setting their sights on consumer and coding agents. Vision based computer using agents and multi agent systems.[00:01:22] Speaker 9: There has been so much progress on the practical reliability and applications of agents in all domains, from the huge launch of Cognition AI's Devon this year, to the sleeper hit of Cursor Composer and recent guest Codium's Windsurf Cascade in the IDE arena. To the explosive revenue growth of recent guests StackBlitz's Bolt, Lovable, and Vercel's vZero.[00:01:44] Speaker 9: And the unicorn rounds and high profile movements of customer support agents like Sierra, now worth 4 billion, and search agents like Perplexity, now worth 9 billion. We wanted to take a little step back to understand the most notable papers of the year in [00:02:00] agents, and Graham indulged with his list of eight perennial problems in building agents.[00:02:06] Speaker 9: As always, don't forget to check our show notes for all the selected best papers of 2024, and for the YouTube link to their talk. Graham's slides were especially popular online, and we are honoured to have him. Watch out and take care![00:02:20] Professor Graham Newbig's Insights on Agents[00:02:20] Speaker: Okay hi everyone. So I was given the task of talking about agents in 2024, and this is An impossible task because there are so many agents, so many agents in 2024. So this is going to be strongly covered by like my personal experience and what I think is interesting and important, but I think it's an important topic.[00:02:41] Speaker: So let's go ahead. So the first thing I'd like to think about is let's say I gave you you know, a highly competent human, some tools. Let's say I gave you a web browser and a terminal or a file system. And the ability to [00:03:00] edit text or code. What could you do with that? Everything. Yeah.[00:03:07] Speaker: Probably a lot of things. This is like 99 percent of my, you know, daily daily life, I guess. When I'm, when I'm working. So, I think this is a pretty powerful tool set, and I am trying to do, and what I think some other people are trying to do, is come up with agents that are able to, you know, manipulate these things.[00:03:26] Speaker: Web browsing, coding, running code in successful ways. So there was a little bit about my profile. I'm a professor at CMU, chief scientist at All Hands AI, building open source coding agents. I'm maintainer of OpenHands, which is an open source coding agent framework. And I'm also a software developer and I, I like doing lots of coding and, and, you know, shipping new features and stuff like this.[00:03:51] Speaker: So building agents that help me to do this, you know, is kind of an interesting thing, very close to me.[00:03:57] Live Demo: Coding Agents in Action[00:03:57] Speaker: So the first thing I'd like to do is I'd like to try [00:04:00] some things that I haven't actually tried before. If anybody has, you know, tried to give a live demo, you know, this is, you know very, very scary whenever you do it and it might not work.[00:04:09] Speaker: So it might not work this time either. But I want to show you like three things that I typically do with coding agents in my everyday work. I use coding agents maybe five to 10 times a day to help me solve my own problems. And so this is a first one. This is a data science task. Which says I want to create scatter plots that show the increase of the SWE bench score over time.[00:04:34] Speaker: And so I, I wrote a kind of concrete prompt about this. Agents work better with like somewhat concrete prompts. And I'm gonna throw this into open hands and let it work. And I'll, I'll go back to that in a second. Another thing that I do is I create new software. And I, I've been using a [00:05:00] service a particular service.[00:05:01] Speaker: I won't name it for sending emails and I'm not very happy with it. So I want to switch over to this new service called resend. com, which makes it easier to send emails. And so I'm going to ask it to read the docs for the resend. com API and come up with a script that allows me to send emails. The input to the script should be a CSV file and the subject and body should be provided in Jinja2 templates.[00:05:24] Speaker: So I'll start another agent and and try to get it to do that for me.[00:05:35] Speaker: And let's go with the last one. The last one I do is. This is improving existing software and in order, you know, once you write software, you usually don't throw it away. You go in and, like, actually improve it iteratively. This software that I have is something I created without writing any code.[00:05:52] Speaker: It's basically software to monitor how much our our agents are contributing to the OpenHance repository. [00:06:00] And on the, let me make that a little bit bigger, on the left side, I have the number of issues where it like sent a pull request. I have the number of issues where it like sent a pull request, whether it was merged in purple, closed in red, or is still open in green. And so these are like, you know, it's helping us monitor, but one thing it doesn't tell me is the total number. And I kind of want that
Happy holidays! We’ll be sharing snippets from Latent Space LIVE! through the break bringing you the best of 2024! We want to express our deepest appreciation to event sponsors AWS, Daylight Computer, Thoth.ai, StrongCompute, Notable Capital, and most of all all our LS supporters who helped fund the gorgeous venue and A/V production!For NeurIPS last year we did our standard conference podcast coverage interviewing selected papers (that we have now also done for ICLR and ICML), however we felt that we could be doing more to help AI Engineers 1) get more industry-relevant content, and 2) recap 2024 year in review from experts. As a result, we organized the first Latent Space LIVE!, our first in person miniconference, at NeurIPS 2024 in Vancouver. Today, we’re proud to share Loubna’s highly anticipated talk (slides here)!Synthetic DataWe called out the Synthetic Data debate at last year’s NeurIPS, and no surprise that 2024 was dominated by the rise of synthetic data everywhere:* Apple’s Rephrasing the Web, Microsoft’s Phi 2-4 and Orca/AgentInstruct, Tencent’s Billion Persona dataset, DCLM, and HuggingFace’s FineWeb-Edu, and Loubna’s own Cosmopedia extended the ideas of synthetic textbook and agent generation to improve raw web scrape dataset quality* This year we also talked to the IDEFICS/OBELICS team at HuggingFace who released WebSight this year, the first work on code-vs-images synthetic data.* We called Llama 3.1 the Synthetic Data Model for its extensive use (and documentation!) of synthetic data in its pipeline, as well as its permissive license. * Nemotron CC and Nemotron-4-340B also made a big splash this year for how they used 20k items of human data to synthesize over 98% of the data used for SFT/PFT.* Cohere introduced Multilingual Arbitrage: Optimizing Data Pools to Accelerate Multilingual Progress observing gains of up to 56.5% improvement in win rates comparing multiple teachers vs the single best teacher model* In post training, AI2’s Tülu3 (discussed by Luca in our Open Models talk) and Loubna’s Smol Talk were also notable open releases this year.This comes in the face of a lot of scrutiny and criticism, with Scale AI as one of the leading voices publishing AI models collapse when trained on recursively generated data in Nature magazine bringing mainstream concerns to the potential downsides of poor quality syndata:Part of the concerns we highlighted last year on low-background tokens are coming to bear: ChatGPT contaminated data is spiking in every possible metric:But perhaps, if Sakana’s AI Scientist pans out this year, we will have mostly-AI AI researchers publishing AI research anyway so do we really care as long as the ideas can be verified to be correct?Smol ModelsMeta surprised many folks this year by not just aggressively updating Llama 3 and adding multimodality, but also adding a new series of “small” 1B and 3B “on device” models this year, even working on quantized numerics collaborations with Qualcomm, Mediatek, and Arm. It is near unbelievable that a 1B model today can qualitatively match a 13B model of last year:and the minimum size to hit a given MMLU bar has come down roughly 10x in the last year. We have been tracking this proxied by Lmsys Elo and inference price:The key reads this year are:* MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases* Apple Intelligence Foundation Language Models* Hymba: A Hybrid-head Architecture for Small Language Models* Loubna’s SmolLM and SmolLM2: a family of state-of-the-art small models with 135M, 360M, and 1.7B parameters on the pareto efficiency frontier.* and Moondream, which we already covered in the 2024 in Vision talkFull Talk on YouTubeplease like and subscribe!Timestamps* [00:00:05] Loubna Intro* [00:00:33] The Rise of Synthetic Data Everywhere* [00:02:57] Model Collapse* [00:05:14] Phi, FineWeb, Cosmopedia - Synthetic Textbooks* [00:12:36] DCLM, Nemotron-CC* [00:13:28] Post Training - AI2 Tulu, Smol Talk, Cohere Multilingual Arbitrage* [00:16:17] Smol Models* [00:18:24] On Device Models* [00:22:45] Smol Vision Models* [00:25:14] What's NextTranscript2024 in Synthetic Data and Smol Models[00:00:00] [00:00:05] Loubna Intro[00:00:05] Speaker: I'm very happy to be here. Thank you for the invitation. So I'm going to be talking about synthetic data in 2024. And then I'm going to be talking about small on device models. So I think the most interesting thing about synthetic data this year is that like now we have it everywhere in the large language models pipeline.[00:00:33] The Rise of Synthetic Data Everywhere[00:00:33] Speaker: I think initially, synthetic data was mainly used just for post training, because naturally that's the part where we needed human annotators. And then after that, we realized that we don't really have good benchmarks to [00:01:00] measure if models follow instructions well, if they are creative enough, or if they are chatty enough, so we also started using LLMs as judges.[00:01:08] Speaker: Thank you. And I think this year and towards the end of last year, we also went to the pre training parts and we started generating synthetic data for pre training to kind of replace some parts of the web. And the motivation behind that is that you have a lot of control over synthetic data. You can control your prompt and basically also the kind of data that you generate.[00:01:28] Speaker: So instead of just trying to filter the web, you could try to get the LLM to generate what you think the best web pages could look like and then train your models on that. So this is how we went from not having synthetic data at all in the LLM pipeline to having it everywhere. And so the cool thing is like today you can train an LLM with like an entirely synthetic pipeline.[00:01:49] Speaker: For example, you can use our Cosmopedia datasets and you can train a 1B model on like 150 billion tokens that are 100 percent synthetic. And those are also of good quality. And then you can [00:02:00] instruction tune the model on a synthetic SFT dataset. You can also do DPO on a synthetic dataset. And then to evaluate if the model is good, you can use.[00:02:07] Speaker: A benchmark that uses LLMs as a judge, for example, MTBench or AlpacaEvil. So I think this is like a really mind blowing because like just a few years ago, we wouldn't think this is possible. And I think there's a lot of concerns about model collapse, and I'm going to talk about that later. But we'll see that like, if we use synthetic data properly and we curate it carefully, that shouldn't happen.[00:02:29] Speaker: And the reason synthetic data is very popular right now is that we have really strong models, both open and closed. It is really cheap and fast to use compared to human annotations, which cost a lot and take a lot of time. And also for open models right now, we have some really good inference frameworks.[00:02:47] Speaker: So if you have enough GPUs, it's really easy to spawn these GPUs and generate like a lot of synthetic data. Some examples are VLM, TGI, and TensorRT.[00:02:57] Model Collapse[00:02:57] Speaker: Now let's talk about the elephant in the room, model [00:03:00] collapse. Is this the end? If you look at the media and all of like, for example, some papers in nature, it's really scary because there's a lot of synthetic data out there in the web.[00:03:09] Speaker: And naturally we train on the web. So we're going to be training a lot of synthetic data. And if model collapse is going to happen, we should really try to take that seriously. And the other issue is that, as I said, we think, a lot of people think the web is polluted because there's a lot of synthetic data.[00:03:24] Speaker: And for example, when we're building fine web datasets here at Guillerm and Hinek, we're interested in like, how much synthetic data is there in the web? So there isn't really a method to properly measure the amount of synthetic data or to save a webpage synthetic or not. But one thing we can do is to try to look for like proxy words, for example, expressions like as a large language model or words like delve that we know are actually generated by chat GPT.[00:03:49] Speaker: We could try to measure the amount of these words in our data system and compare them to the previous years. For example, here, we measured like a, these words ratio in different dumps of common crawl. [00:04:00] And we can see that like the ratio really increased after chat GPT's release. So if we were to say that synthetic data amount didn't change, you would expect this ratio to stay constant, which is not the case.[00:04:11] Speaker: So there's a lot of synthetic data probably on the web, but does this really make models worse? So what we did is we trained different models on these different dumps. And we then computed their performance on popular, like, NLP benchmarks, and then we computed the aggregated score. And surprisingly, you can see that the latest DOMs are actually even better than the DOMs that are before.[00:04:31] Speaker: So if there's some synthetic data there, at least it did not make the model's worse. Yeah, which is really encouraging. So personally, I wouldn't say the web is positive with Synthetic Data. Maybe it's even making it more rich. And the issue with like model collapse is that, for example, those studies, they were done at like a small scale, and you would ask the model to complete, for example, a Wikipedia paragraph, and then you would train it on these new generations, and you would do that every day.[00:04:56] Speaker: iteratively. I think if you do that approach, it's normal to [00:05:00] observe this kind of behavior because the quality is going to be worse because the model is already small. And then if you train it just on its generations, you shouldn't expect it to become better. But what we're really doing here is that we take a model that is very large and we try to distill its knowledge into a model that is smaller.[00:05:14] Phi, Fin
Happy holidays! We’ll be sharing snippets from Latent Space LIVE! through the break bringing you the best of 2024! We want to express our deepest appreciation to event sponsors AWS, Daylight Computer, Thoth.ai, StrongCompute, Notable Capital, and most of all all our LS supporters who helped fund the gorgeous venue and A/V production!For NeurIPS last year we did our standard conference podcast coverage interviewing selected papers (that we have now also done for ICLR and ICML), however we felt that we could be doing more to help AI Engineers 1) get more industry-relevant content, and 2) recap 2024 year in review from experts. As a result, we organized the first Latent Space LIVE!, our first in person miniconference, at NeurIPS 2024 in Vancouver.Of perennial interest, particularly at academic conferences, is scaled-up architecture research as people hunt for the next Attention Is All You Need. We have many names for them: “efficient models”, “retentive networks”, “subquadratic attention” or “linear attention” but some of them don’t even have any lineage with attention - one of the best papers of this NeurIPS was Sepp Hochreiter’s xLSTM, which has a particularly poetic significance as one of the creators of the LSTM returning to update and challenge the OG language model architecture:So, for lack of a better term, we decided to call this segment “the State of Post-Transformers” and fortunately everyone rolled with it.We are fortunate to have two powerful friends of the pod to give us an update here:* Together AI: with CEO Vipul Ved Prakash and CTO Ce Zhang joining us to talk about how they are building Together together as a quote unquote full stack AI startup, from the lowest level kernel and systems programming to the highest level mathematical abstractions driving new model architectures and inference algorithms, with notable industry contributions from RedPajama v2, Flash Attention 3, Mamba 2, Mixture of Agents, BASED, Sequoia, Evo, Dragonfly, Dan Fu's ThunderKittens and many more research projects this year* Recursal AI: with CEO Eugene Cheah who has helped lead the independent RWKV project while also running Featherless AI. This year, the team has shipped RWKV v5, codenamed Eagle, to 1.5 billion Windows 10 and Windows 11 machines worldwide, to support Microsoft's on-device, energy-usage-sensitive Windows Copilot usecases, and has launched the first updates on RWKV v6, codenamed Finch and GoldFinch. On the morning of Latent Space Live, they also announced QRWKV6, a Qwen 32B model modified with RWKV linear attention layers. We were looking to host a debate between our speakers, but given that both of them were working on post-transformers alternativesFull Talk on YoutubePlease like and subscribe!LinksAll the models and papers they picked:* Earlier Cited Work* Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention* Hungry hungry hippos: Towards language modeling with state space models* Hyena hierarchy: Towards larger convolutional language models* Mamba: Linear-Time Sequence Modeling with Selective State Spaces* S4: Efficiently Modeling Long Sequences with Structured State Spaces* Just Read Twice (Arora et al)* Recurrent large language models that compete with Transformers in language modeling perplexity are emerging at a rapid rate (e.g., Mamba, RWKV). Excitingly, these architectures use a constant amount of memory during inference. However, due to the limited memory, recurrent LMs cannot recall and use all the information in long contexts leading to brittle in-context learning (ICL) quality. A key challenge for efficient LMs is selecting what information to store versus discard. In this work, we observe the order in which information is shown to the LM impacts the selection difficulty. * To formalize this, we show that the hardness of information recall reduces to the hardness of a problem called set disjointness (SD), a quintessential problem in communication complexity that requires a streaming algorithm (e.g., recurrent model) to decide whether inputted sets are disjoint. We empirically and theoretically show that the recurrent memory required to solve SD changes with set order, i.e., whether the smaller set appears first in-context. * Our analysis suggests, to mitigate the reliance on data order, we can put information in the right order in-context or process prompts non-causally. Towards that end, we propose: (1) JRT-Prompt, where context gets repeated multiple times in the prompt, effectively showing the model all data orders. This gives 11.0±1.3 points of improvement, averaged across 16 recurrent LMs and the 6 ICL tasks, with 11.9× higher throughput than FlashAttention-2 for generation prefill (length 32k, batch size 16, NVidia H100). We then propose (2) JRT-RNN, which uses non-causal prefix-linear-attention to process prompts and provides 99% of Transformer quality at 360M params., 30B tokens and 96% at 1.3B params., 50B tokens on average across the tasks, with 19.2× higher throughput for prefill than FA2.* Jamba: A 52B Hybrid Transformer-Mamba Language Model* We present Jamba, a new base large language model based on a novel hybrid Transformer-Mamba mixture-of-experts (MoE) architecture. * Specifically, Jamba interleaves blocks of Transformer and Mamba layers, enjoying the benefits of both model families. MoE is added in some of these layers to increase model capacity while keeping active parameter usage manageable. * This flexible architecture allows resource- and objective-specific configurations. In the particular configuration we have implemented, we end up with a powerful model that fits in a single 80GB GPU.* Built at large scale, Jamba provides high throughput and small memory footprint compared to vanilla Transformers, and at the same time state-of-the-art performance on standard language model benchmarks and long-context evaluations. Remarkably, the model presents strong results for up to 256K tokens context length. * We study various architectural decisions, such as how to combine Transformer and Mamba layers, and how to mix experts, and show that some of them are crucial in large scale modeling. We also describe several interesting properties of these architectures which the training and evaluation of Jamba have revealed, and plan to release checkpoints from various ablation runs, to encourage further exploration of this novel architecture. We make the weights of our implementation of Jamba publicly available under a permissive license.* SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers* We introduce Sana, a text-to-image framework that can efficiently generate images up to 4096×4096 resolution. Sana can synthesize high-resolution, high-quality images with strong text-image alignment at a remarkably fast speed, deployable on laptop GPU. Core designs include: * (1) Deep compression autoencoder: unlike traditional AEs, which compress images only 8×, we trained an AE that can compress images 32×, effectively reducing the number of latent tokens. * (2) Linear DiT: we replace all vanilla attention in DiT with linear attention, which is more efficient at high resolutions without sacrificing quality. * (3) Decoder-only text encoder: we replaced T5 with modern decoder-only small LLM as the text encoder and designed complex human instruction with in-context learning to enhance the image-text alignment. * (4) Efficient training and sampling: we propose Flow-DPM-Solver to reduce sampling steps, with efficient caption labeling and selection to accelerate convergence. * As a result, Sana-0.6B is very competitive with modern giant diffusion model (e.g. Flux-12B), being 20 times smaller and 100+ times faster in measured throughput. Moreover, Sana-0.6B can be deployed on a 16GB laptop GPU, taking less than 1 second to generate a 1024×1024 resolution image. Sana enables content creation at low cost. * RWKV: Reinventing RNNs for the Transformer Era* Transformers have revolutionized almost all natural language processing (NLP) tasks but suffer from memory and computational complexity that scales quadratically with sequence length. In contrast, recurrent neural networks (RNNs) exhibit linear scaling in memory and computational requirements but struggle to match the same performance as Transformers due to limitations in parallelization and scalability. * We propose a novel model architecture, Receptance Weighted Key Value (RWKV), that combines the efficient parallelizable training of transformers with the efficient inference of RNNs.* Our approach leverages a linear attention mechanism and allows us to formulate the model as either a Transformer or an RNN, thus parallelizing computations during training and maintains constant computational and memory complexity during inference. * We scale our models as large as 14 billion parameters, by far the largest dense RNN ever trained, and find RWKV performs on par with similarly sized Transformers, suggesting future work can leverage this architecture to create more efficient models. This work presents a significant step towards reconciling trade-offs between computational efficiency and model performance in sequence processing tasks.* LoLCATs: On Low-Rank Linearizing of Large Language Models* Recent works show we can linearize large language models (LLMs) -- swapping the quadratic attentions of popular Transformer-based LLMs with subquadratic analogs, such as linear attention -- avoiding the expensive pretraining costs. However, linearizing LLMs often significantly degrades model quality, still requires training over billions of tokens, and remains limited to smaller 1.3B to 7B LLMs. * We thus propose Low-rank Linear Conversion via Attention Transfer (LoLCATs), a simple two-step method that improves LLM linearizing quality with orders of magnitudes less memory and compute. * We base these steps on two findings. * First, we can replace an LLM's softmax attentions with closely-approximating linear attentions,
Happy holidays! We’ll be sharing snippets from Latent Space LIVE! through the break bringing you the best of 2024! We want to express our deepest appreciation to event sponsors AWS, Daylight Computer, Thoth.ai, StrongCompute, Notable Capital, and most of all our LS supporters who helped fund the venue and A/V production!For NeurIPS last year we did our standard conference podcast coverage interviewing selected papers (that we have now also done for ICLR and ICML), however we felt that we could be doing more to help AI Engineers 1) get more industry-relevant content, and 2) recap 2024 year in review from experts. As a result, we organized the first Latent Space LIVE!, our first in person miniconference, at NeurIPS 2024 in Vancouver.Since Nathan Lambert ( Interconnects ) joined us for the hit RLHF 201 episode at the start of this year, it is hard to overstate how much Open Models have exploded this past year. In 2023 only five names were playing in the top LLM ranks, Mistral, Mosaic's MPT, TII UAE's Falcon, Yi from Kai-Fu Lee's 01.ai, and of course Meta's Llama 1 and 2. This year a whole cast of new open models have burst on the scene, from Google's Gemma and Cohere's Command R, to Alibaba's Qwen and Deepseek models, to LLM 360 and DCLM and of course to the Allen Institute's OLMo, OL MOE, Pixmo, Molmo, and Olmo 2 models. We were honored to host Luca Soldaini, one of the research leads on the Olmo series of models at AI2.Pursuing Open Model research comes with a lot of challenges beyond just funding and access to GPUs and datasets, particularly the regulatory debates this year across Europe, California and the White House. We also were honored to hear from and Sophia Yang, head of devrel at Mistral, who also presented a great session at the AI Engineer World's Fair Open Models track!Full Talk on YouTubePlease like and subscribe!Timestamps* 00:00 Welcome to Latent Space Live * 00:12 Recap of 2024: Best Moments and Keynotes * 01:22 Explosive Growth of Open Models in 2024 * 02:04 Challenges in Open Model Research * 02:38 Keynote by Luca Soldani: State of Open Models * 07:23 Significance of Open Source AI Licenses * 11:31 Research Constraints and Compute Challenges * 13:46 Fully Open Models: A New Trend * 27:46 Mistral's Journey and Innovations * 32:57 Interactive Demo: Lachat Capabilities * 36:50 Closing Remarks and NetworkingTranscriptSession3Audio[00:00:00] AI Charlie: Welcome to Latent Space Live, our first mini conference held at NeurIPS 2024 in Vancouver. This is Charlie, your AI co host. As a special treat this week, we're recapping the best of 2024 going domain by domain. We sent out a survey to the over 900 of you who told us what you wanted, and then invited the best speakers in the latent space network to cover each field.[00:00:28] AI Charlie: 200 of you joined us in person throughout the day, with over 2, 200 watching live online. Our next keynote covers the state of open models in 2024, with Luca Soldani and Nathan Lambert of the Allen Institute for AI, with a special appearance from Dr. Sophia Yang of Mistral. Our first hit episode of 2024 was with Nathan Lambert on RLHF 201 back in January.[00:00:57] AI Charlie: Where he discussed both reinforcement learning for language [00:01:00] models and the growing post training and mid training stack with hot takes on everything from constitutional AI to DPO to rejection sampling and also previewed the sea change coming to the Allen Institute. And to Interconnects, his incredible substack on the technical aspects of state of the art AI training.[00:01:18] AI Charlie: We highly recommend subscribing to get access to his Discord as well. It is hard to overstate how much open models have exploded this past year. In 2023, only five names were playing in the top LLM ranks. Mistral, Mosaics MPT, and Gatsby. TII UAE's Falcon, Yi, from Kaifu Lee's 01. ai, And of course, Meta's Lama 1 and 2.[00:01:43] AI Charlie: This year, a whole cast of new open models have burst on the scene. From Google's Jemma and Cohere's Command R, To Alibaba's Quen and DeepSeq models, to LLM360 and DCLM, and of course, to the Allen Institute's OLMO, [00:02:00] OLMOE, PIXMO, MOLMO, and OLMO2 models. Pursuing open model research comes with a lot of challenges beyond just funding and access to GPUs and datasets, particularly the regulatory debates this year across Europe.[00:02:14] AI Charlie: California and the White House. We also were honored to hear from Mistral, who also presented a great session at the AI Engineer World's Fair Open Models track. As always, don't forget to check the show notes for the YouTube link to their talk, as well as their slides. Watch out and take care.[00:02:35] Luca Intro[00:02:35] Luca Soldaini: Cool. Yeah, thanks for having me over. I'm Luca. I'm a research scientist at the Allen Institute for AI. I threw together a few slides on sort of like a recap of like interesting themes in open models for, for 2024. Have about maybe 20, 25 minutes of slides, and then we can chat if there are any questions.[00:02:57] Luca Soldaini: If I can advance to the next slide. [00:03:00] Okay, cool. So I did the quick check of like, to sort of get a sense of like, how much 2024 was different from 2023. So I went on Hugging Face and sort of get, tried to get a picture of what kind of models were released in 2023 and like, what do we get in 2024?[00:03:16] Luca Soldaini: 2023 we get, we got things like both LLAMA 1 and 2, we got Mistral, we got MPT, Falcon models, I think the YI model came in at the end. Tail end of the year. It was a pretty good year. But then I did the same for 2024. And it's actually quite stark difference. You have models that are, you know, reveling frontier level.[00:03:38] Luca Soldaini: Performance of what you can get from closed models from like Quen, from DeepSeq. We got Llama3. We got all sorts of different models. I added our own Olmo at the bottom. There's this growing group of like, Fully open models that I'm going to touch on a little bit later. But you know, just looking at the slides, it feels like 2024 [00:04:00] was just smooth sailing, happy knees, much better than previous year.[00:04:04] Luca Soldaini: And you know, you can plot you can pick your favorite benchmark Or least favorite, I don't know, depending on what point you're trying to make. And plot, you know, your closed model, your open model and sort of spin it in ways that show that, oh, you know open models are much closer to where closed models are today versus to Versus last year where the gap was fairly significant.[00:04:29] Luca Soldaini: So one thing that I think I don't know if I have to convince people in this room, but usually when I give this talks about like open models, there is always like this background question in, in, in people's mind of like, why should we use open models? APIs argument, you know, it's, it's. Just an HTTP request to get output from a, from one of the best model out there.[00:04:53] Luca Soldaini: Why do I have to set up infra and use local models? And there are really like two answer. There is the more [00:05:00] researchy answer for this, which is where it might be. Background lays, which is just research. If you want to do research on language models, research thrives on, on open models, there is like large swath of research on modeling, on how these models behave on evaluation and inference on mechanistic interpretability that could not happen at all if you didn't have open models they're also for AI builders, they're also like.[00:05:30] Luca Soldaini: Good use cases for using local models. You know, you have some, this is like a very not comprehensive slides, but you have things like there are some application where local models just blow closed models out of the water. So like retrieval, it's a very clear example. We might have like constraints like Edge AI applications where it makes sense.[00:05:51] Luca Soldaini: But even just like in terms of like stability, being able to say this model is not changing under the hood. It's, there's plenty of good cases for, [00:06:00] for open models. And the community is just not models. Is I stole this slide from one of the Quent2 announcement blog posts. But it's super cool to see like how much tech exists around open models and serving them on making them efficient and hosting them.[00:06:18] Luca Soldaini: It's pretty cool. And so. It's if you think about like where the term opens come from, comes from like the open source really open models meet the core tenants of, of open, of open source specifically when it comes around collaboration, there is truly a spirit, like through these open models, you can build on top of other people.[00:06:41] Luca Soldaini: innovation. We see a lot of these even in our own work of like, you know, as we iterate in the various versions of Alma it's not just like every time we collect from scratch all the data. No, the first step is like, okay, what are the cool data sources and datasets people have put [00:07:00] together for language model for training?[00:07:01] Luca Soldaini: Or when it comes to like our post training pipeline We one of the steps is you want to do some DPO and you use a lot of outputs of other models to improve your, your preference model. So it's really having like an open sort of ecosystem benefits and accelerates the development of open models.[00:07:23] The Definition of Open Models[00:07:23] Luca Soldaini: One thing that we got in 2024, which is not a specific model, but I thought it was really significant, is we first got we got our first open source AI definition. So this is from the open source initiative they've been generally the steward of a lot of the open source licenses when it comes to software and so they embarked on this journey in trying to figure out, okay, How does a license, an open source license for a model look like?[00:07:52] Luca Soldaini: Majority of the work is very dry because licenses are dry. So I'm not going to walk through t
Happy holidays! We’ll be sharing snippets from Latent Space LIVE! through the break bringing you the best of 2024! We want to express our deepest appreciation to event sponsors AWS, Daylight Computer, Thoth.ai, StrongCompute, Notable Capital, and most of all all our LS supporters who helped fund the gorgeous venue and A/V production!For NeurIPS last year we did our standard conference podcast coverage interviewing selected papers (that we have now also done for ICLR and ICML), however we felt that we could be doing more to help AI Engineers 1) get more industry-relevant content, and 2) recap 2024 year in review from experts. As a result, we organized the first Latent Space LIVE!, our first in person miniconference, at NeurIPS 2024 in Vancouver.The single most requested domain was computer vision, and we could think of no one better to help us recap 2024 than our friends at Roboflow, who was one of our earliest guests in 2023 and had one of this year’s top episodes in 2024 again. Roboflow has since raised a $40m Series B!LinksTheir slides are here:All the trends and papers they picked:* Isaac Robinson* Sora (see our Video Diffusion pod) - extending diffusion from images to video* SAM 2: Segment Anything in Images and Videos (see our SAM2 pod) - extending prompted masks to full video object segmentation* DETR Dominancy: DETRs show Pareto improvement over YOLOs* RT-DETR: DETRs Beat YOLOs on Real-time Object Detection* LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection* D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement* Peter Robicheaux* MMVP (Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs)* * Florence 2 (Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks) * PalíGemma / PaliGemma 2* PaliGemma: A versatile 3B VLM for transfer* PaliGemma 2: A Family of Versatile VLMs for Transfer* AlMv2 (Multimodal Autoregressive Pre-training of Large Vision Encoders) * Vik Korrapati - MoondreamFull Talk on YouTubeWant more content like this? Like and subscribe to stay updated on our latest talks, interviews, and podcasts.Transcript/Timestamps[00:00:00] Intro[00:00:05] AI Charlie: welcome to Latent Space Live, our first mini conference held at NeurIPS 2024 in Vancouver. This is Charlie, your AI co host. When we were thinking of ways to add value to our academic conference coverage, we realized that there was a lack of good talks, just recapping the best of 2024, going domain by domain.[00:00:36] AI Charlie: We sent out a survey to the over 900 of you. who told us what you wanted, and then invited the best speakers in the Latent Space Network to cover each field. 200 of you joined us in person throughout the day, with over 2, 200 watching live online. Our second featured keynote is The Best of Vision 2024, with Peter Robichaud and Isaac [00:01:00] Robinson of Roboflow, with a special appearance from Vic Corrapati of Moondream.[00:01:05] AI Charlie: When we did a poll of our attendees, the highest interest domain of the year was vision. And so our first port of call was our friends at Roboflow. Joseph Nelson helped us kickstart our vision coverage in episode 7 last year, and this year came back as a guest host with Nikki Ravey of Meta to cover segment Anything 2.[00:01:25] AI Charlie: Roboflow have consistently been the leaders in open source vision models and tooling. With their SuperVision library recently eclipsing PyTorch's Vision library. And Roboflow Universe hosting hundreds of thousands of open source vision datasets and models. They have since announced a 40 million Series B led by Google Ventures.[00:01:46] AI Charlie: Woohoo.[00:01:48] Isaac's picks[00:01:48] Isaac Robinson: Hi, we're Isaac and Peter from Roboflow, and we're going to talk about the best papers of 2024 in computer vision. So, for us, we defined best as what made [00:02:00] the biggest shifts in the space. And to determine that, we looked at what are some major trends that happened and what papers most contributed to those trends.[00:02:09] Isaac Robinson: So I'm going to talk about a couple trends, Peter's going to talk about a trend, And then we're going to hand it off to Moondream. So, the trends that I'm interested in talking about are These are a major transition from models that run on per image basis to models that run using the same basic ideas on video.[00:02:28] Isaac Robinson: And then also how debtors are starting to take over the real time object detection scene from the YOLOs, which have been dominant for years.[00:02:37] Sora, OpenSora and Video Vision vs Generation[00:02:37] Isaac Robinson: So as a highlight we're going to talk about Sora, which from my perspective is the biggest paper of 2024, even though it came out in February. Is the what?[00:02:48] Isaac Robinson: Yeah. Yeah. So just it's a, SORA is just a a post. So I'm going to fill it in with details from replication efforts, including open SORA and related work, such as a stable [00:03:00] diffusion video. And then we're also going to talk about SAM2, which applies the SAM strategy to video. And then how debtors, These are the improvements in 2024 to debtors that are making them a Pareto improvement to YOLO based models.[00:03:15] Isaac Robinson: So to start this off, we're going to talk about the state of the art of video generation at the end of 2023, MagVIT MagVIT is a discrete token, video tokenizer akin to VQ, GAN, but applied to video sequences. And it actually outperforms state of the art handcrafted video compression frameworks.[00:03:38] Isaac Robinson: In terms of the bit rate versus human preference for quality and videos generated by autoregressing on these discrete tokens generate some pretty nice stuff, but up to like five seconds length and, you know, not super detailed. And then suddenly a few months later we have this, which when I saw it, it was totally mind blowing to me.[00:03:59] Isaac Robinson: 1080p, [00:04:00] a whole minute long. We've got light reflecting in puddles. That's reflective. Reminds me of those RTX demonstrations for next generation video games, such as Cyberpunk, but with better graphics. You can see some issues in the background if you look closely, but they're kind of, as with a lot of these models, the issues tend to be things that people aren't going to pay attention to unless they're looking for.[00:04:24] Isaac Robinson: In the same way that like six fingers on a hand. You're not going to notice is a giveaway unless you're looking for it. So yeah, as we said, SORA does not have a paper. So we're going to be filling it in with context from the rest of the computer vision scene attempting to replicate these efforts. So the first step, you have an LLM caption, a huge amount of videos.[00:04:48] Isaac Robinson: This, this is a trick that they introduced in Dolly 3, where they train a image captioning model to just generate very high quality captions for a huge corpus and then train a diffusion model [00:05:00] on that. Their Sora and their application efforts also show a bunch of other steps that are necessary for good video generation.[00:05:09] Isaac Robinson: Including filtering by aesthetic score and filtering by making sure the videos have enough motion. So they're not just like kind of the generators not learning to just generate static frames. So. Then we encode our video into a series of space time latents. Once again, SORA, very sparse in details.[00:05:29] Isaac Robinson: So the replication related works, OpenSORA actually uses a MAG VIT V2 itself to do this, but swapping out the discretization step with a classic VAE autoencoder framework. They show that there's a lot of benefit from getting the temporal compression, which makes a lot of sense as the Each sequential frames and videos have mostly redundant information.[00:05:53] Isaac Robinson: So by compressing against, compressing in the temporal space, you allow the latent to hold [00:06:00] a lot more semantic information while avoiding that duplicate. So, we've got our spacetime latents. Possibly via, there's some 3D VAE, presumably a MAG VATV2 and then you throw it into a diffusion transformer.[00:06:19] Isaac Robinson: So I think it's personally interesting to note that OpenSORA is using a MAG VATV2, which originally used an autoregressive transformer decoder to model the latent space, but is now using a diffusion diffusion transformer. So it's still a transformer happening. Just the question is like, is it?[00:06:37] Isaac Robinson: Parameterizing the stochastic differential equation is, or parameterizing a conditional distribution via autoregression. It's also it's also worth noting that most diffusion models today, the, the very high performance ones are switching away from the classic, like DDPM denoising diffusion probability modeling framework to rectified flows.[00:06:57] Isaac Robinson: Rectified flows have a very interesting property that as [00:07:00] they converge, they actually get closer to being able to be sampled with a single step. Which means that in practice, you can actually generate high quality samples much faster. Major problem of DDPM and related models for the past four years is just that they require many, many steps to generate high quality samples.[00:07:22] Isaac Robinson: So, and naturally, the third step is throwing lots of compute at the problem. So I didn't, I never figured out how to manage to get this video to loop, but we see very little compute, medium compute, lots of compute. This is so interesting because the the original diffusion transformer paper from Facebook actually showed that, in fact, the specific hyperparameters of the transformer didn't really matter that much.[00:07:48] Isaac Robinson: What mattered was that you were just increasing the amount of compute that the model had. So, I love how in the, once again, little blog posts, they don't even talk about [00:08:00] like the specific hyperparameters. They say, we're using a diffusion t
Happy holidays! We’ll be sharing snippets from Latent Space LIVE! through the break bringing you the best of 2024 from friends of the pod!For NeurIPS last year we did our standard conference podcast coverage interviewing selected papers (that we have now also done for ICLR and ICML), however we felt that we could be doing more to help AI Engineers 1) get more industry-relevant content, and 2) recap 2024 year in review from experts. As a result, we organized the first Latent Space LIVE!, our first in person miniconference, at NeurIPS 2024 in Vancouver. For our opening keynote, we could think of no one better to cover 'The State of AI Startups' than our friend Sarah Guo (AI superinvestor, founder of Conviction, host of No Priors!) and Pranav Reddy (Conviction partner) to share their takes on how the AI landscape evolved in 2024 examine the evolving AI landscape and what it means for startups, enterprises, and the industry as a whole! They completely understood the assignment.Recorded live with 200+ in-person and 2200+ online attendees at NeurIPS 2024, this keynote kicks off our mini-conference series exploring different domains of AI development in 2024. Enjoy!LinksSlides: https://x.com/saranormous/status/1866933642401886707Sarh Guo: https://x.com/saranormousPranav Reddy: https://x.com/prnvrdyFull Video on YouTubeWant more content like this? Like and subscribe to stay updated on our latest talks, interviews, and podcasts. Get full access to Latent Space at www.latent.space/subscribe
Our second podcast guest ever in March 2023 was Varun Mohan, CEO of Codeium; at the time, they had around 10,000 users and how they vowed to keep their autocomplete free forever: Today, over a million developers use their products, they still have their free tier, and they recently launched Windsurf, an AI IDE. Chapters* 00:00:00: Introductions & Catchup* 00:03:52: Why they created Windsurf* 00:05:52: Limitations of VS Code* 00:10:12: Evaluation methods for Cascade and Windsurf* 00:16:15: Listener questions about Windsurf launch* 00:20:30: Remote execution and security concerns* 00:25:18: Evolution of Codeium's strategy* 00:28:29: Cascade and its capabilities* 00:33:12: Multi-agent systems* 00:37:02: Areas of improvement for Windsurf* 00:39:12: Building an enterprise-first company* 00:42:01: Copilot for X, AI UX, and Enterprise AI blog posts Get full access to Latent Space at www.latent.space/subscribe
Regular tickets are now sold out for Latent Space LIVE! at NeurIPS! We have just announced our last speaker and newest track, friend of the pod Nathan Lambert who will be recapping 2024 in Reasoning Models like o1! We opened up a handful of late bird tickets for those who are deciding now — use code DISCORDGANG if you need it. See you in Vancouver!We’ve been sitting on our ICML recordings for a while (from today’s first-ever SOLO guest cohost, Brittany Walker), and in light of Sora Turbo’s launch (blogpost, tutorials) today, we figured it would be a good time to drop part one which had been gearing up to be a deep dive into the state of generative video worldsim, with a seamless transition to vision (the opposite modality), and finally robots (their ultimate application).Sora, Genie, and the field of Generative Video World SimulatorsBill Peebles, author of Diffusion Transformers, gave his most recent Sora talk at ICML, which begins our episode:* William (Bill) Peebles - SORA (slides)Something that is often asked about Sora is how much inductive biases were introduced to achieve these results. Bill references the same principles brought by Hyung Won Chung from the o1 team - “sooner or later those biases come back to bite you”.We also recommend these reads from throughout 2024 on Sora.* Lilian Weng’s literature review of Video Diffusion Models* Sora API leak* Estimates of 100k-700k H100s needed to serve Sora (not Turbo)* Artist guides on using Sora for professional storytellingGoogle DeepMind had a remarkably strong presence at ICML on Video Generation Models, winning TWO Best Paper awards for:* Genie: Generative Interactive Environments (covered in oral, poster, and workshop)* VideoPoet: A Large Language Model for Zero-Shot Video Generation (see website)We end this part by taking in Tali Dekel’s talk on The Future of Video Generation: Beyond Data and Scale.Part 2: Generative Modeling and DiffusionSince 2023, Sander Dieleman’s perspectives (blogpost, tweet) on diffusion as “spectral autoregression in the frequency domain” while working on Imagen and Veo have caught the public imagination, so we highlight his talk:* Wading through the noise: an intuitive look at diffusion modelsThen we go to Ben Poole for his talk on Inferring 3D Structure with 2D Priors, including his work on NeRFs and DreamFusion:Then we investigate two flow matching papers - one from the Flow Matching co-authors - Ricky T. Q. Chen (FAIR, Meta)And how it is implemented in Stable Diffusion 3 with Scaling Rectified Flow Transformers for High-Resolution Image Synthesis Our last hit on Diffusion is a couple of oral presentations on speech, which we leave you to explore via our audio podcast* NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models* Speech Self-Supervised Learning Using Diffusion Model Synthetic DataPart 3: VisionThe ICML Test of Time winner was DeCAF, which Trevor Darrell notably called “the OG vision foundation model”.Lucas Beyer’s talk on “Vision in the age of LLMs — a data-centric perspective” was also well received online, and he talked about his journey from Vision Transformers to PaliGemma.We give special honorable mention to MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark.Part 4: Reinforcement Learning and RoboticsWe segue vision into robotics with the help of Ashley Edwards, whose work on both the Gato and the Genie teams at Deepmind is summarized in Learning actions, policies, rewards, and environments from videos alone.Brittany highlighted two poster session papers:* Behavior Generation with Latent Actions* We also recommend Lerrel Pinto’s On Building General-Purpose Robots* PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMsHowever we must give the lion’s share of space to Chelsea Finn, now founder of Physical Intelligence, who gave FOUR talks on* "What robots have taught me about machine learning"* developing robot generalists* robots that adapt autonomously* how to give feedback to your language model* special mention to PI colleague Sergey Levine on Robotic Foundation ModelsWe end the podcast with a position paper that links generative environments and RL/robotics: Automatic Environment Shaping is the Next Frontier in RL.Timestamps* [00:00:00] Intros* [00:02:43] Sora - Bill Peebles* [00:44:52] Genie: Generative Interactive Environments* [01:00:17] Genie interview* [01:12:33] VideoPoet: A Large Language Model for Zero-Shot Video Generation* [01:30:51] VideoPoet interview - Dan Kondratyuk* [01:42:00] Tali Dekel - The Future of Video Generation: Beyond Data and Scale.* [02:27:07] Sander Dieleman - Wading through the noise: an intuitive look at diffusion models* [03:06:20] Ben Poole - Inferring 3D Structure with 2D Priors* [03:30:30] Ricky Chen - Flow Matching* [04:00:03] Patrick Esser - Stable Diffusion 3* [04:14:30] NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models* [04:27:00] Speech Self-Supervised Learning Using Diffusion Model Synthetic Data* [04:39:00] ICML Test of Time winner: DeCAF* [05:03:40] Lucas Beyer: “Vision in the age of LLMs — a data-centric perspective”* [05:42:00] Ashley Edwards: Learning actions, policies, rewards, and environments from videos alone.* [06:03:30] Behavior Generation with Latent Actions interview* [06:09:52] Chelsea Finn: "What robots have taught me about machine learning"* [06:56:00] Position: Automatic Environment Shaping is the Next Frontier in RL Get full access to Latent Space at www.latent.space/subscribe
The full schedule for Latent Space LIVE! at NeurIPS has been announced, featuring Best of 2024 overview talks for the AI Startup Landscape, Computer Vision, Open Models, Transformers Killers, Synthetic Data, Agents, and Scaling, and speakers from Sarah Guo of Conviction, Roboflow, AI2/Meta, Recursal/Together, HuggingFace, OpenHands and SemiAnalysis. Join us for the IRL event/Livestream! Alessio will also be holding a meetup at AWS Re:Invent in Las Vegas this Wednesday. See our new Events page for dates of AI Engineer Summit, Singapore, and World’s Fair in 2025. LAST CALL for questions for our big 2024 recap episode! Submit questions and messages on Speakpipe here for a chance to appear on the show!When we first observed that GPT Wrappers are Good, Actually, we did not even have Bolt on our radar. Since we recorded our Anthropic episode discussing building Agents with the new Claude 3.5 Sonnet, Bolt.new (by Stackblitz) has easily cleared the $8m ARR bar, repeating and accelerating its initial $4m feat.There are very many AI code generators and VS Code forks out there, but Bolt probably broke through initially because of its incredible zero shot low effort app generation:But as we explain in the pod, Bolt also emphasized deploy (Netlify)/ backend (Supabase)/ fullstack capabilities on top of Stackblitz’s existing WebContainer full-WASM-powered-developer-environment-in-the-browser tech. Since then, the team has been shipping like mad (with weekly office hours), with bugfixing, full screen, multi-device, long context, diff based edits (using speculative decoding like we covered in Inference, Fast and Slow).All of this has captured the imagination of low/no code builders like Greg Isenberg and many others on YouTube/TikTok/Reddit/X/Linkedin etc:Just as with Fireworks, our relationship with Bolt/Stackblitz goes a bit deeper than normal - swyx advised the launch and got a front row seat to this epic journey, as well as demoed it with Realtime Voice at the recent OpenAI Dev Day. So we are very proud to be the first/closest to tell the full open story of Bolt/Stackblitz!Flow Engineering + Qodo/AlphaCodium UpdateIn year 2 of the pod we have been on a roll getting former guests to return as guest cohosts (Harrison Chase, Aman Sanger, Jon Frankle), and it was a pleasure to catch Itamar Friedman back on the pod, giving us an update on all things Qodo and Testing Agents from our last catchup a year and a half ago:Qodo (they renamed in September) went viral in early January this year with AlphaCodium (paper here, code here) beating DeepMind’s AlphaCode with high efficiency:With a simple problem solving code agent:* The first step is to have the model reason about the problem. They describe it using bullet points and focus on the goal, inputs, outputs, rules, constraints, and any other relevant details.* Then, they make the model reason about the public tests and come up with an explanation of why the input leads to that particular output. * The model generates two to three potential solutions in text and ranks them in terms of correctness, simplicity, and robustness. * Then, it generates more diverse tests for the problem, covering cases not part of the original public tests. * Iteratively, pick a solution, generate the code, and run it on a few test cases. * If the tests fail, improve the code and repeat the process until the code passes every test.swyx has previously written similar thoughts on types vs tests for putting bounds on program behavior, but AlphaCodium extends this to AI generated tests and code.More recently, Itamar has also shown that AlphaCodium’s techniques also extend well to the o1 models:Making Flow Engineering a useful technique to improve code model performance on every model. This is something we see AI Engineers uniquely well positioned to do compared to ML Engineers/Researchers.Full Video PodcastLike and subscribe!Show Notes* Itamar* Qodo* First episode* Eric* Bolt* StackBlitz* Thinkster* AlphaCodium* WebContainersChapters* 00:00:00 Introductions & Updates* 00:06:01 Generic vs. Specific AI Agents* 00:07:40 Maintaining vs Creating with AI* 00:17:46 Human vs Agent Computer Interfaces* 00:20:15 Why Docker doesn't work for Bolt* 00:24:23 Creating Testing and Code Review Loops* 00:28:07 Bolt's Task Breakdown Flow* 00:31:04 AI in Complex Enterprise Environments* 00:41:43 AlphaCodium* 00:44:39 Strategies for Breaking Down Complex Tasks* 00:45:22 Building in Open Source* 00:50:35 Choosing a product as a founder* 00:59:03 Reflections on Bolt Success* 01:06:07 Building a B2C GTM* 01:18:11 AI Capabilities and Pricing Tiers* 01:20:28 What makes Bolt unique* 01:23:07 Future Growth and Product Development* 01:29:06 Competitive Landscape in AI Engineering* 01:30:01 Advice to Founders and Embracing AI* 01:32:20 Having a baby and completing an Iron ManTranscriptAlessio [00:00:00]: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol.ai.Swyx [00:00:12]: Hey, and today we're still in our sort of makeshift in-between studio, but we're very delighted to have a former returning guest host, Itamar. Welcome back.Itamar [00:00:21]: Great to be here after a year or more. Yeah, a year and a half.Swyx [00:00:24]: You're one of our earliest guests on Agents. Now you're CEO co-founder of Kodo. Right. Which has just been renamed. You also raised a $40 million Series A, and we can get caught up on everything, but we're also delighted to have our new guest, Eric. Welcome.Eric [00:00:42]: Thank you. Excited to be here. Should I say Bolt or StackBlitz?Swyx [00:00:45]: Like, is it like its own company now or?Eric [00:00:47]: Yeah. Bolt's definitely bolt.new. That's the thing that we're probably the most known for, I imagine, at this point.Swyx [00:00:54]: Which is ridiculous to say because you were working at StackBlitz for so long.Eric [00:00:57]: Yeah. I mean, within a week, we were doing like double the amount of traffic. And StackBlitz had been online for seven years, and we were like, what? But anyways, yeah. So we're StackBlitz, the company behind bolt.new. If you've heard of bolt.new, that's our stuff. Yeah.Swyx [00:01:12]: Yeah.Itamar [00:01:13]: Excellent. I see, by the way, that the founder mode, you need to know to capture opportunities. So kudos on doing that, right? You're working on some technology, and then suddenly you can exploit that to a new world. Yeah.Eric [00:01:24]: Totally. And I think, well, not to jump, but 100%, I mean, a couple of months ago, we had the idea for Bolt earlier this year, but we haven't really shared this too much publicly. But we actually had tried to build it with some of those state-of-the-art models back in January, February, you can kind of imagine which, and they just weren't good enough to actually do the code generation where the code was accurate and it was fast and whatever have you without a ton of like rag, but then there was like issues with that. So we put it on the shelf and then we got kind of a sneak peek of some of the new models that have come out in the past couple of months now. And so once we saw that, once we actually saw the code gen from it, we were like, oh my God, like, okay, we can build a product around this. And so that was really the impetus of us building the thing. But with that, it was StackBlitz, the core StackBlitz product the past seven years has been an IDE for developers. So the entire user experience flow we've built up just didn't make sense. And so when we kind of went out to build Bolt, we just thought, you know, if we were inventing our product today, what would the interface look like given what is now possible with the AI code gen? And so there's definitely a lot of conversations we had internally, but you know, just kind of when we logically laid it out, we were like, yeah, I think it makes sense to just greenfield a new thing and let's see what happens. If it works great, then we'll figure it out. If it doesn't work great, then it'll get deleted at some point. So that's kind of how it actually came to be.Swyx [00:02:49]: I'll mention your background a little bit. You were also founder of Thinkster before you started StackBlitz. So both of you are second time founders. Both of you have sort of re-founded your company recently. Yours was more of a rename. I think a slightly different direction as well. And then we can talk about both. Maybe just chronologically, should we get caught up on where Kodo is first and then you know, just like what people should know since the last pod? Sure.Itamar [00:03:12]: The last pod was two months after we launched and we basically had the vision that we talked about. The idea that software development is about specification, test and code, etc. We are more on the testing part as in essence, we think that if you solve testing, you solve software development. The beautiful chart that we'll put up on screen. And testing is a really big field, like there are many dimensions, unit testing, the level of the component, how big it is, how large it is. And then there is like different type of testing, is it regression or smoke or whatever. So back then we only had like one ID extension with unit tests as in focus. One and a half year later, first ID extension supports more type of testing as context aware. We index local, local repos, but also 10,000s of repos for Fortune 500 companies. We have another agent, another tool that is called, the pure agent is the open source and the commercial one is CodoMerge. And then we have another open source called CoverAgent, which is not yet a commercial product coming very soon. It's very impressive. It could be that already people are approving automated pull requests that they don't even aware in really big open sources. So once we have enough of these, we will also launch another agent. So for the first one and a half year, what we did is grew in our offering and mostly on the sid
We have announced our first speaker, friend of the show Dylan Patel, and topic slates for Latent Space LIVE! at NeurIPS. Sign up for IRL/Livestream and to debate!We are still taking questions for our next big recap episode! Submit questions and messages on Speakpipe here for a chance to appear on the show!The vibe shift we observed in July - in favor of Claude 3.5 Sonnet, first introduced in June — has been remarkably long lived and persistent, surviving multiple subsequent updates of 4o, o1 and Gemini versions, for Anthropic’s Claude to end 2024 as the preferred model for AI Engineers and even being the exclusive choice for new code agents like bolt.new (our next guest on the pod!), which unlocked so much performance from Claude Sonnet that it went from $0 to $4m ARR in 4 weeks when it launched last month.Anthropic has now raised an additional $4b from Amazon and made an incredibly well received update of Claude 3.5 Sonnet (and Haiku), making significant improvements in performance over its predecessors:Solving SWE-BenchAs part of the October Sonnet release, Anthropic teased a blink-and-you’ll miss it result:The updated Claude 3.5 Sonnet shows wide-ranging improvements on industry benchmarks, with particularly strong gains in agentic coding and tool use tasks. On coding, it improves performance on SWE-bench Verified from 33.4% to 49.0%, scoring higher than all publicly available models—including reasoning models like OpenAI o1-preview and specialized systems designed for agentic coding. It also improves performance on TAU-bench, an agentic tool use task, from 62.6% to 69.2% in the retail domain, and from 36.0% to 46.0% in the more challenging airline domain. The new Claude 3.5 Sonnet offers these advancements at the same price and speed as its predecessor.This was followed up by a blogpost a week later from today’s guest, Erik Schluntz, the engineer who implemented and scored this SOTA result using a simple, non-overengineered version of the SWE-Agent framework (you can see the submissions here). We have previously covered the SWE-Bench story extensively:* Speaking with SWEBench/SWEAgent authors at ICLR* Speaking with Cosine Genie, the previous SOTA (43.8%) on SWEBench Verified (with brief update at DevDay 2024)* Speaking with Shunyu Yao on SWEBench and the ReAct paradigm driving SWE-AgentOne of the notable inclusions in this blogpost are the tools that Erik decided to give Claude, e.g. the “Edit Tool”:The tools teased in the SWEBench submission/blogpost were then polished up and released with Computer Use…And you can also see even more computer use tools given in the new Model Context Protocol servers:Claude Computer UseBecause it is one of the best received AI releases of the year, we recommend watching the 2 minute Computer Use intro (and related demos) in its entirety:Eric also worked on Claude’s function calling, tool use, and computer use APIs, so we discuss that in the episode.Erik [00:53:39]: With computer use, just give the thing a browser that's logged into what you want to integrate with, and it's going to work immediately. And I see that reduction in friction as being incredibly exciting. Imagine a customer support team where, okay, hey, you got this customer support bot, but you need to go integrate it with all these things. And you don't have any engineers on your customer support team. But if you can just give the thing a browser that's logged into your systems that you need it to have access to, now, suddenly, in one day, you could be up and rolling with a fully integrated customer service bot that could go do all the actions you care about. So I think that's the most exciting thing for me about computer use, is reducing that friction of integrations to almost zero.As you’ll see, this is very top of mind for Erik as a former Robotics founder who’s company basically used robots to interface with human physical systems like elevators.Full Video episodePlease like and subscribe!Show Notes* Eric Schluntz* “Raising the bar on SWE-Bench Verified”* Cobalt Robotics* SWE-Bench* SWE-Bench Verified* Human Eval & other benchmarks* Anthropic Workbench* Aider* Cursor* Fireworks AI* E2B* Amanda Askell* Toyota Research* Physical Intelligence (Pi)* Chelsea Finn* Josh Albrecht* Eric Jang* 1X* Dust* Cosine Episode* Bolt* Adept Episode* TauBench* LMSys EpisodeTimestamps* [00:00:00] Introductions* [00:03:39] What is SWE-Bench?* [00:12:22] SWE-Bench vs HumanEval vs others* [00:15:21] SWE-Agent architecture and runtime* [00:21:18] Do you need code indexing?* [00:24:50] Giving the agent tools* [00:27:47] Sandboxing for coding agents* [00:29:16] Why not write tests?* [00:30:31] Redesigning engineering tools for LLMs* [00:35:53] Multi-agent systems* [00:37:52] Why XML so good?* [00:42:57] Thoughts on agent frameworks* [00:45:12] How many turns can an agent do?* [00:47:12] Using multiple model types* [00:51:40] Computer use and agent use cases* [00:59:04] State of AI robotics* [01:04:24] Robotics in manufacturing* [01:05:01] Hardware challenges in robotics* [01:09:21] Is self-driving a good business?TranscriptAlessio [00:00:00]: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO at Decibel Partners. And today we're in the new studio with my usual co-host, Shawn from Smol AI.Swyx [00:00:14]: Hey, and today we're very blessed to have Erik Schluntz from Anthropic with us. Welcome.Erik [00:00:19]: Hi, thanks very much. I'm Erik Schluntz. I'm a member of technical staff at Anthropic, working on tool use, computer use, and Swebench.Swyx [00:00:27]: Yeah. Well, how did you get into just the whole AI journey? I think you spent some time at SpaceX as well? Yeah. And robotics. Yeah. There's a lot of overlap between like the robotics people and the AI people, and maybe like there's some interlap or interest between language models for robots right now. Maybe just a little bit of background on how you got to where you are. Yeah, sure.Erik [00:00:50]: I was at SpaceX a long time ago, but before joining Anthropic, I was the CTO and co-founder of Cobalt Robotics. We built security and inspection robots. These are sort of five foot tall robots that would patrol through an office building or a warehouse looking for anything out of the ordinary. Very friendly, no tasers or anything. We would just sort of call a remote operator if we saw anything. We have about 100 of those out in the world, and had a team of about 100. We actually got acquired about six months ago, but I had left Cobalt about a year ago now, because I was starting to get a lot more excited about AI. I had been writing a lot of my code with things like Copilot, and I was like, wow, this is actually really cool. If you had told me 10 years ago that AI would be writing a lot of my code, I would say, hey, I think that's AGI. And so I kind of realized that we had passed this level, like, wow, this is actually really useful for engineering work. That got me a lot more excited about AI and learning about large language models. So I ended up taking a sabbatical and then doing a lot of reading and research myself and decided, hey, I want to go be at the core of this and joined Anthropic.Alessio [00:01:53]: And why Anthropic? Did you consider other labs? Did you consider maybe some of the robotics companies?Erik [00:02:00]: So I think at the time I was a little burnt out of robotics, and so also for the rest of this, any sort of negative things I say about robotics or hardware is coming from a place of burnout, and I reserve my right to change my opinion in a few years. Yeah, I looked around, but ultimately I knew a lot of people that I really trusted and I thought were incredibly smart at Anthropic, and I think that was the big deciding factor to come there. I was like, hey, this team's amazing. They're not just brilliant, but sort of like the most nice and kind people that I know, and so I just felt like I could be a really good culture fit. And ultimately, I do care a lot about AI safety and making sure that I don't want to build something that's used for bad purposes, and I felt like the best chance of that was joining Anthropic.Alessio [00:02:39]: And from the outside, these labs kind of look like huge organizations that have these obscureSwyx [00:02:44]: ways to organize.Alessio [00:02:45]: How did you get, you joined Anthropic, did you already know you were going to work on of the stuff you publish or you kind of join and then you figure out where you land? I think people are always curious to learn more.Erik [00:02:57]: Yeah, I've been very happy that Anthropic is very bottoms up and sort of very sort of receptive to whatever your interests are. And so I joined sort of being very transparent of like, hey, I'm most excited about code generation and AI that can actually go out and sort of touch the world or sort of help people build things. And, you know, those weren't my initial initial projects. I also came in and said, hey, I want to do the most valuable possible thing for this company and help Anthropic succeed. And, you know, like, let me find the balance of those. So I was working on lots of things at the beginning, you know, function calling, tool use. And then sort of as it became more and more relevant, I was like, oh, hey, like, let's it's time to go work on encoding agents and sort of started looking at SWE-Bench as sort of a really good benchmark for that.Swyx [00:03:39]: So let's get right into SWE-Bench. That's one of the many claims to fame. I feel like there's just been a series of releases related with Cloud 3.5 Sonnet around about two or three months ago, 3.5 Sonnet came out and it was it was a step ahead in terms of a lot of people immediately fell in love with it for coding. And then last month you released a new updated version of Cloud Sonnet. We're not going to talk about the training for that because that's still confidential. But I think Anthropic's done a really good job, like applying the model to different things.
We have a full slate of upcoming events: AI Engineer London, AWS Re:Invent in Las Vegas, and now Latent Space LIVE! at NeurIPS in Vancouver and online. Sign up to join and speak!We are still taking questions for our next big recap episode! Submit questions and messages on Speakpipe here for a chance to appear on the show!We try to stay close to the inference providers as part of our coverage, as our podcasts with Together AI and Replicate will attest: However one of the most notable pull quotes from our very well received Braintrust episode was his opinion that open source model adoption has NOT gone very well and is actually declining in relative market share terms (it is of course increasing in absolute terms):Today’s guest, Lin Qiao, would wholly disagree. Her team of Pytorch/GPU experts are wholly dedicated toward helping you serve and finetune the full stack of open source models from Meta and others, across all modalities (Text, Audio, Image, Embedding, Vision-understanding), helping customers like Cursor and Hubspot scale up open source model inference both rapidly and affordably.Fireworks has emerged after its successive funding rounds with top tier VCs as one of the leaders of the Compound AI movement, a term first coined by the Databricks/Mosaic gang at Berkeley AI and adapted as “Composite AI” by Gartner:Replicating o1We are the first podcast to discuss Fireworks’ f1, their proprietary replication of OpenAI’s o1. This has become a surprisingly hot area of competition in the past week as both Nous Forge and Deepseek r1 have launched competitive models.Full Video PodcastLike and subscribe!Timestamps* 00:00:00 Introductions* 00:02:08 Pre-history of Fireworks and PyTorch at Meta* 00:09:49 Product Strategy: From Framework to Model Library* 00:13:01 Compound AI Concept and Industry Dynamics* 00:20:07 Fireworks' Distributed Inference Engine* 00:22:58 OSS Model Support and Competitive Strategy* 00:29:46 Declarative System Approach in AI* 00:31:00 Can OSS replicate o1?* 00:36:51 Fireworks f1* 00:41:03 Collaboration with Cursor and Speculative Decoding* 00:46:44 Fireworks quantization (and drama around it)* 00:49:38 Pricing Strategy* 00:51:51 Underrated Features of Fireworks Platform* 00:55:17 HiringTranscriptAlessio [00:00:00]: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner at CTO at Danceable Partners, and I'm joined by my co-host, Swyx founder, Osmalayar.Swyx [00:00:11]: Hey, and today we're in a very special studio inside the Fireworks office with Lin Qiang, CEO of Fireworks. Welcome. Yeah.Lin [00:00:20]: Oh, you should welcome us.Swyx [00:00:21]: Yeah, welcome. Yeah, thanks for having us. It's unusual to be in the home of a startup, but it's also, I think our relationship is a bit unusual compared to all our normal guests. Definitely.Lin [00:00:34]: Yeah. I'm super excited to talk about very interesting topics in that space with both of you.Swyx [00:00:41]: You just celebrated your two-year anniversary yesterday.Lin [00:00:43]: Yeah, it's quite a crazy journey. We circle around and share all the crazy stories across these two years, and it has been super fun. All the way from we experienced Silicon Valley bank run to we delete some data that shouldn't be deleted operationally. We went through a massive scale where we actually are busy getting capacity to, yeah, we learned to kind of work with it as a team with a lot of brilliant people across different places to join a company. It has really been a fun journey.Alessio [00:01:24]: When you started, did you think the technical stuff will be harder or the bank run and then the people side? I think there's a lot of amazing researchers that want to do companies and it's like the hardest thing is going to be building the product and then you have all these different other things. So, were you surprised by what has been your experience the most?Lin [00:01:42]: Yeah, to be honest with you, my focus has always been on the product side and then after the product goes to market. And I didn't realize the rest has been so complicated, operating a company and so on. But because I don't think about it, I just kind of manage it. So it's done. I think I just somehow don't think about it too much and solve whatever problem coming our way and it worked.Swyx [00:02:08]: So let's, I guess, let's start at the pre-history, the initial history of Fireworks. You ran the PyTorch team at Meta for a number of years and we previously had Sumit Chintal on and I think we were just all very interested in the history of GenEI. Maybe not that many people know how deeply involved Faire and Meta were prior to the current GenEI revolution.Lin [00:02:35]: My background is deep in distributed system, database management system. And I joined Meta from the data side and I saw this tremendous amount of data growth, which cost a lot of money and we're analyzing what's going on. And it's clear that AI is driving all this data generation. So it's a very interesting time because when I joined Meta, Meta is going through ramping down mobile-first, finishing the mobile-first transition and then starting AI-first. And there's a fundamental reason about that sequence because mobile-first gave a full range of user engagement that has never existed before. And all this user engagement generated a lot of data and this data power AI. So then the whole entire industry is also going through, falling through this same transition. When I see, oh, okay, this AI is powering all this data generation and look at where's our AI stack. There's no software, there's no hardware, there's no people, there's no team. I want to dive up there and help this movement. So when I started, it's very interesting industry landscape. There are a lot of AI frameworks. It's a kind of proliferation of AI frameworks happening in the industry. But all the AI frameworks focus on production and they use a very certain way of defining the graph of neural network and then use that to drive the model iteration and productionization. And PyTorch is completely different. So they could also assume that he was the user of his product. And he basically says, researchers face so much pain using existing AI frameworks, this is really hard to use and I'm going to do something different for myself. And that's the origin story of PyTorch. PyTorch actually started as the framework for researchers. They don't care about production at all. And as they grow in terms of adoption, so the interesting part of AI is research is the top of our normal production. There are so many researchers across academic, across industry, they innovate and they put their results out there in open source and that power the downstream productionization. So it's brilliant for MATA to establish PyTorch as a strategy to drive massive adoption in open source because MATA internally is a PyTorch shop. So it creates a flying wheel effect. So that's kind of a strategy behind PyTorch. But when I took on PyTorch, it's kind of at Caspo, MATA established PyTorch as the framework for both research and production. So no one has done that before. And we have to kind of rethink how to architect PyTorch so we can really sustain production workload, the stability, reliability, low latency, all this production concern was never a concern before. Now it's a concern. And we actually have to adjust its design and make it work for both sides. And that took us five years because MATA has so many AI use cases, all the way from ranking recommendation as powering the business top line or as ranking newsfeed, video ranking to site integrity detect bad content automatically using AI to all kinds of effects, translation, image classification, object detection, all this. And also across AI running on the server side, on mobile phones, on AI VR devices, the wide spectrum. So by the time we actually basically managed to support AI across ubiquitous everywhere across MATA. But interestingly, through open source engagement, we work with a lot of companies. It is clear to us like this industry is starting to take on AI first transition. And of course, MATA's hyperscale always go ahead of industry. And it feels like when we start this AI journey at MATA, there's no software, no hardware, no team. For many companies we engage with through PyTorch, we feel the pain. That's the genesis why we feel like, hey, if we create fireworks and support industry going through this transition, it will be a huge amount of impact. Of course, the problem that the industry is facing will not be the same as MATA. MATA is so big, right? So it's kind of skewed towards extreme scale and extreme optimization in the industry will be different. But we feel like we have the technical chop and we've seen a lot. We'll look to kind of drive that. So yeah, so that's how we started.Swyx [00:06:58]: When you and I chatted about the origins of fireworks, it was originally envisioned more as a PyTorch platform, and then later became much more focused on generative AI. Is that fair to say? What was the customer discovery here?Lin [00:07:13]: Right. So I would say our initial blueprint is we should build a PyTorch cloud because a PyTorch library and there's no SaaS platform to enable AI workloads.Swyx [00:07:26]: Even in 2022, it's interesting.Lin [00:07:28]: I would not say absolutely no, but cloud providers have some of those, but it's not first class citizen, right? At 2022, there's still like TensorFlow is massively in production. And this is all pre-gen AI, and PyTorch is kind of getting more and more adoption. But there's no PyTorch-first SaaS platform existing. At the same time, we are also a very pragmatic set of people. We really want to make sure from the get-go, we get really, really close to customers. We understand their use case, we understand their pain points, we understand the value we deliver to them. So we want to take a different approach instead of building a horizontal PyTorch cloud. We want to bu
Alessio will be at AWS re:Invent next week and hosting a casual coffee meetup on Wednesday, RSVP here! And subscribe to our calendar for our Singapore, NeurIPS, and all upcoming meetups!We are still taking questions for our next big recap episode! Submit questions and messages on Speakpipe here for a chance to appear on the show!If you've been following the AI agents space, you have heard of Lindy AI; while founder Flo Crivello is hesitant to call it "blowing up," when folks like Andrew Wilkinson start obsessing over your product, you're definitely onto something.In our latest episode, Flo walked us through Lindy's evolution from late 2022 to now, revealing some design choices about agent platform design that go against conventional wisdom in the space.The Great Reset: From Text Fields to RailsRemember late 2022? Everyone was "LLM-pilled," believing that if you just gave a language model enough context and tools, it could do anything. Lindy 1.0 followed this pattern:* Big prompt field ✅* Bunch of tools ✅* Prayer to the LLM gods ✅Fast forward to today, and Lindy 2.0 looks radically different. As Flo put it (~17:00 in the episode): "The more you can put your agent on rails, one, the more reliable it's going to be, obviously, but two, it's also going to be easier to use for the user."Instead of a giant, intimidating text field, users now build workflows visually:* Trigger (e.g., "Zendesk ticket received")* Required actions (e.g., "Check knowledge base")* Response generationThis isn't just a UI change - it's a fundamental rethinking of how to make AI agents reliable. As Swyx noted during our discussion: "Put Shoggoth in a box and make it a very small, minimal viable box. Everything else should be traditional if-this-then-that software."The Surprising Truth About Model LimitationsHere's something that might shock folks building in the space: with Claude 3.5 Sonnet, the model is no longer the bottleneck. Flo's exact words (~31:00): "It is actually shocking the extent to which the model is no longer the limit. It was the limit a year ago. It was too expensive. The context window was too small."Some context: Lindy started when context windows were 4K tokens. Today, their system prompt alone is larger than that. But what's really interesting is what this means for platform builders:* Raw capabilities aren't the constraint anymore* Integration quality matters more than model performance* User experience and workflow design are the new bottlenecksThe Search Engine Parallel: Why Horizontal Platforms Might WinOne of the spiciest takes from our conversation was Flo's thesis on horizontal vs. vertical agent platforms. He draws a fascinating parallel to search engines (~56:00):"I find it surprising the extent to which a horizontal search engine has won... You go through Google to search Reddit. You go through Google to search Wikipedia... search in each vertical has more in common with search than it does with each vertical."His argument: agent platforms might follow the same pattern because:* Agents across verticals share more commonalities than differences* There's value in having agents that can work together under one roof* The R&D cost of getting agents right is better amortized across use casesThis might explain why we're seeing early vertical AI companies starting to expand horizontally. The core agent capabilities - reliability, context management, tool integration - are universal needs.What This Means for BuildersIf you're building in the AI agents space, here are the key takeaways:* Constrain First: Rather than maximizing capabilities, focus on reliable execution within narrow bounds* Integration Quality Matters: With model capabilities plateauing, your competitive advantage lies in how well you integrate with existing tools* Memory Management is Key: Flo revealed they actively prune agent memories - even with larger context windows, not all memories are useful* Design for Discovery: Lindy's visual workflow builder shows how important interface design is for adoptionThe Meta LayerThere's a broader lesson here about AI product development. Just as Lindy evolved from "give the LLM everything" to "constrain intelligently," we might see similar evolution across the AI tooling space. The winners might not be those with the most powerful models, but those who best understand how to package AI capabilities in ways that solve real problems reliably.Full Video PodcastFlo’s talk at AI Engineer SummitChapters* 00:00:00 Introductions * 00:04:05 AI engineering and deterministic software * 00:08:36 Lindys demo* 00:13:21 Memory management in AI agents * 00:18:48 Hierarchy and collaboration between Lindys * 00:21:19 Vertical vs. horizontal AI tools * 00:24:03 Community and user engagement strategies * 00:26:16 Rickrolling incident with Lindy * 00:28:12 Evals and quality control in AI systems * 00:31:52 Model capabilities and their impact on Lindy * 00:39:27 Competition and market positioning * 00:42:40 Relationship between Factorio and business strategy * 00:44:05 Remote work vs. in-person collaboration * 00:49:03 Europe vs US Tech* 00:58:59 Testing the Overton window and free speech * 01:04:20 Balancing AI safety concerns with business innovation Show Notes* Lindy.ai* Rick Rolling* Flo on X* TeamFlow* Andrew Wilkinson* Dust* Poolside.ai* SB1047* Gathertown* Sid Sijbrandij* Matt Mullenweg* Factorio* Seeing Like a StateTranscriptAlessio [00:00:00]: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol.ai.Swyx [00:00:12]: Hey, and today we're joined in the studio by Florent Crivello. Welcome.Flo [00:00:15]: Hey, yeah, thanks for having me.Swyx [00:00:17]: Also known as Altimore. I always wanted to ask, what is Altimore?Flo [00:00:21]: It was the name of my character when I was playing Dungeons & Dragons. Always. I was like 11 years old.Swyx [00:00:26]: What was your classes?Flo [00:00:27]: I was an elf. I was a magician elf.Swyx [00:00:30]: Well, you're still spinning magic. Right now, you're a solo founder and CEO of Lindy.ai. What is Lindy?Flo [00:00:36]: Yeah, we are a no-code platform letting you build your own AI agents easily. So you can think of we are to LangChain as Airtable is to MySQL. Like you can just pin up AI agents super easily by clicking around and no code required. You don't have to be an engineer and you can automate business workflows that you simply could not automate before in a few minutes.Swyx [00:00:55]: You've been in our orbit a few times. I think you spoke at our Latent Space anniversary. You spoke at my summit, the first summit, which was a really good keynote. And most recently, like we actually already scheduled this podcast before this happened. But Andrew Wilkinson was like, I'm obsessed by Lindy. He's just created a whole bunch of agents. So basically, why are you blowing up?Flo [00:01:16]: Well, thank you. I think we are having a little bit of a moment. I think it's a bit premature to say we're blowing up. But why are things going well? We revamped the product majorly. We called it Lindy 2.0. I would say we started working on that six months ago. We've actually not really announced it yet. It's just, I guess, I guess that's what we're doing now. And so we've basically been cooking for the last six months, like really rebuilding the product from scratch. I think I'll list you, actually, the last time you tried the product, it was still Lindy 1.0. Oh, yeah. If you log in now, the platform looks very different. There's like a ton more features. And I think one realization that we made, and I think a lot of folks in the agent space made the same realization, is that there is such a thing as too much of a good thing. I think many people, when they started working on agents, they were very LLM peeled and chat GPT peeled, right? They got ahead of themselves in a way, and us included, and they thought that agents were actually, and LLMs were actually more advanced than they actually were. And so the first version of Lindy was like just a giant prompt and a bunch of tools. And then the realization we had was like, hey, actually, the more you can put your agent on Rails, one, the more reliable it's going to be, obviously, but two, it's also going to be easier to use for the user, because you can really, as a user, you get, instead of just getting this big, giant, intimidating text field, and you type words in there, and you have no idea if you're typing the right word or not, here you can really click and select step by step, and tell your agent what to do, and really give as narrow or as wide a guardrail as you want for your agent. We started working on that. We called it Lindy on Rails about six months ago, and we started putting it into the hands of users over the last, I would say, two months or so, and I think things really started going pretty well at that point. The agent is way more reliable, way easier to set up, and we're already seeing a ton of new use cases pop up.Swyx [00:03:00]: Yeah, just a quick follow-up on that. You launched the first Lindy in November last year, and you were already talking about having a DSL, right? I remember having this discussion with you, and you were like, it's just much more reliable. Is this still the DSL under the hood? Is this a UI-level change, or is it a bigger rewrite?Flo [00:03:17]: No, it is a much bigger rewrite. I'll give you a concrete example. Suppose you want to have an agent that observes your Zendesk tickets, and it's like, hey, every time you receive a Zendesk ticket, I want you to check my knowledge base, so it's like a RAG module and whatnot, and then answer the ticket. The way it used to work with Lindy before was, you would type the prompt asking it to do that. You check my knowledge base, and so on and so forth. The problem with doing that is that it can always go wrong. You're praying the LLM gods that they will actually invoke y
We are recording our next big recap episode and taking questions! Submit questions and messages on Speakpipe here for a chance to appear on the show!Also subscribe to our calendar for our Singapore, NeurIPS, and all upcoming meetups!In our first ever episode with Logan Kilpatrick we called out the two hottest LLM frameworks at the time: LangChain and Dust. We’ve had Harrison from LangChain on twice (as a guest and as a co-host), and we’ve now finally come full circle as Stanislas from Dust joined us in the studio.After stints at Oracle and Stripe, Stan had joined OpenAI to work on mathematical reasoning capabilities. He describes his time at OpenAI as "the PhD I always wanted to do" while acknowledging the challenges of research work: "You're digging into a field all day long for weeks and weeks, and you find something, you get super excited for 12 seconds. And at the 13 seconds, you're like, 'oh, yeah, that was obvious.' And you go back to digging." This experience, combined with early access to GPT-4's capabilities, shaped his decision to start Dust: "If we believe in AGI and if we believe the timelines might not be too long, it's actually the last train leaving the station to start a company. After that, it's going to be computers all the way down."The History of DustDust's journey can be broken down into three phases:* Developer Framework (2022): Initially positioned as a competitor to LangChain, Dust started as a developer tooling platform. While both were open source, their approaches differed – LangChain focused on broad community adoption and integration as a pure developer experience, while Dust emphasized UI-driven development and better observability that wasn’t just `print` statements.* Browser Extension (Early 2023): The company pivoted to building XP1, a browser extension that could interact with web content. This experiment helped validate user interaction patterns with AI, even while using less capable models than GPT-4.* Enterprise Platform (Current): Today, Dust has evolved into an infrastructure platform for deploying AI agents within companies, with impressive metrics like 88% daily active users in some deployments.The Case for Being HorizontalThe big discussion for early stage companies today is whether or not to be horizontal or vertical. Since models are so good at general tasks, a lot of companies are building vertical products that take care of a workflow end-to-end in order to offer more value and becoming more of “Services as Software”. Dust on the other hand is a platform for the users to build their own experiences, which has had a few advantages:* Maximum Penetration: Dust reports 60-70% weekly active users across entire companies, demonstrating the potential reach of horizontal solutions rather than selling into a single team.* Emergent Use Cases: By allowing non-technical users to create agents, Dust enables use cases to emerge organically from actual business needs rather than prescribed solutions.* Infrastructure Value: The platform approach creates lasting value through maintained integrations and connections, similar to how Stripe's value lies in maintaining payment infrastructure. Rather than relying on third-party integration providers, Dust maintains its own connections to ensure proper handling of different data types and structures.The Vertical ChallengeHowever, this approach comes with trade-offs:* Harder Go-to-Market: As Stan talked about: "We spike at penetration... but it makes our go-to-market much harder. Vertical solutions have a go-to-market that is much easier because they're like, 'oh, I'm going to solve the lawyer stuff.'"* Complex Infrastructure: Building a horizontal platform requires maintaining numerous integrations and handling diverse data types appropriately – from structured Salesforce data to unstructured Notion pages. As you scale integrations, the cost of maintaining them also scales. * Product Surface Complexity: Creating an interface that's both powerful and accessible to non-technical users requires careful design decisions, down to avoiding technical terms like "system prompt" in favor of "instructions." The Future of AI PlatformsStan initially predicted we'd see the first billion-dollar single-person company in 2023 (a prediction later echoed by Sam Altman), but he's now more focused on a different milestone: billion-dollar companies with engineering teams of just 20 people, enabled by AI assistance.This vision aligns with Dust's horizontal platform approach – building the infrastructure that allows small teams to achieve outsized impact through AI augmentation. Rather than replacing entire job functions (the vertical approach), they're betting on augmenting existing workflows across organizations.Full YouTube EpisodeChapters* 00:00:00 Introductions* 00:04:33 Joining OpenAI from Paris* 00:09:54 Research evolution and compute allocation at OpenAI* 00:13:12 Working with Ilya Sutskever and OpenAI's vision* 00:15:51 Leaving OpenAI to start Dust* 00:18:15 Early focus on browser extension and WebGPT-like functionality* 00:20:20 Dust as the infrastructure for agents* 00:24:03 Challenges of building with early AI models* 00:28:17 LLMs and Workflow Automation* 00:35:28 Building dependency graphs of agents* 00:37:34 Simulating API endpoints* 00:40:41 State of AI models* 00:43:19 Running evals* 00:46:36 Challenges in building AI agents infra* 00:49:21 Buy vs. build decisions for infrastructure components* 00:51:02 Future of SaaS and AI's Impact on Software* 00:53:07 The single employee $1B company race* 00:56:32 Horizontal vs. vertical approaches to AI agentsTranscriptAlessio [00:00:00]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol.ai.Swyx [00:00:11]: Hey, and today we're in a studio with Stanislas, welcome.Stan [00:00:14]: Thank you very much for having me.Swyx [00:00:16]: Visiting from Paris.Stan [00:00:17]: Paris.Swyx [00:00:18]: And you have had a very distinguished career. It's very hard to summarize, but you went to college in both Ecopolytechnique and Stanford, and then you worked in a number of places, Oracle, Totems, Stripe, and then OpenAI pre-ChatGPT. We'll talk, we'll spend a little bit of time about that. About two years ago, you left OpenAI to start Dust. I think you were one of the first OpenAI alum founders.Stan [00:00:40]: Yeah, I think it was about at the same time as the Adept guys, so that first wave.Swyx [00:00:46]: Yeah, and people really loved our David episode. We love a few sort of OpenAI stories, you know, for back in the day, like we're talking about pre-recording. Probably the statute of limitations on some of those stories has expired, so you can talk a little bit more freely without them coming after you. But maybe we'll just talk about, like, what was your journey into AI? You know, you were at Stripe for almost five years, there are a lot of Stripe alums going into OpenAI. I think the Stripe culture has come into OpenAI quite a bit.Stan [00:01:11]: Yeah, so I think the buses of Stripe people really started flowing in, I guess, after ChatGPT. But, yeah, my journey into AI is a... I mean, Greg Brockman. Yeah, yeah. From Greg, of course. And Daniela, actually, back in the days, Daniela Amodei.Swyx [00:01:27]: Yes, she was COO, I mean, she is COO, yeah. She had a pretty high job at OpenAI at the time, yeah, for sure.Stan [00:01:34]: My journey started as anybody else, you're fascinated with computer science and you want to make them think, it's awesome, but it doesn't work. I mean, it was a long time ago, it was like maybe 16, so it was 25 years ago. Then the first big exposure to AI would be at Stanford, and I'm going to, like, disclose a whole lamb, because at the time it was a class taught by Andrew Ng, and there was no deep learning. It was half features for vision and a star algorithm. So it was fun. But it was the early days of deep learning. At the time, I think a few years after, it was the first project at Google. But you know, that cat face or the human face trained from many images. I went to, hesitated doing a PhD, more in systems, eventually decided to go into getting a job. Went at Oracle, started a company, did a gazillion mistakes, got acquired by Stripe, worked with Greg Buckman there. And at the end of Stripe, I started interesting myself in AI again, felt like it was the time, you had the Atari games, you had the self-driving craziness at the time. And I started exploring projects, it felt like the Atari games were incredible, but there were still games. And I was looking into exploring projects that would have an impact on the world. And so I decided to explore three things, self-driving cars, cybersecurity and AI, and math and AI. It's like I sing it by a decreasing order of impact on the world, I guess.Swyx [00:03:01]: Discovering new math would be very foundational.Stan [00:03:03]: It is extremely foundational, but it's not as direct as driving people around.Swyx [00:03:07]: Sorry, you're doing this at Stripe, you're like thinking about your next move.Stan [00:03:09]: No, it was at Stripe, kind of a bit of time where I started exploring. I did a bunch of work with friends on trying to get RC cars to drive autonomously. Almost started a company in France or Europe about self-driving trucks. We decided to not go for it because it was probably very operational. And I think the idea of the company, of the team wasn't there. And also I realized that if I wake up a day and because of a bug I wrote, I killed a family, it would be a bad experience. And so I just decided like, no, that's just too crazy. And then I explored cybersecurity with a friend. We're trying to apply transformers to cut fuzzing. So cut fuzzing, you have kind of an algorithm that goes really fast and tries to mutate the inputs of a library to find bugs. And we tried to apply a transformer to that and do reinforcement learning with
Apologies for lower audio quality; we lost recordings and had to use backup tracks. Our guests today are Anastasios Angelopoulos and Wei-Lin Chiang, leads of Chatbot Arena, fka LMSYS, the crowdsourced AI evaluation platform developed by the LMSys student club at Berkeley, which became the de facto standard for comparing language models. Arena Elo is often more cited than MMLU scores to many folks, and they have attracted >1,000,000 people to cast votes since its launch, leading top model trainers to cite them over their own formal academic benchmarks:The Limits of Static BenchmarksWe’ve done two benchmarks episodes: Benchmarks 101 and Benchmarks 201. One issue we’ve always brought up with static benchmarks is that 1) many are getting saturated, with models scoring almost perfectly on them 2) they often don’t reflect production use cases, making it hard for developers and users to use them as guidance. The fundamental challenge in AI evaluation isn't technical - it's philosophical. How do you measure something that increasingly resembles human intelligence? Rather than trying to define intelligence upfront, Arena let users interact naturally with models and collect comparative feedback. It's messy and subjective, but that's precisely the point - it captures the full spectrum of what people actually care about when using AI.The Pareto Frontier of Cost vs IntelligenceBecause the Elo scores are remarkably stable over time, we can put all the chat models on a map against their respective cost to gain a view of at least 3 orders of magnitude of model sizes/costs and observe the remarkable shift in intelligence per dollar over the past year:This frontier stood remarkably firm through the recent releases of o1-preview and price cuts of Gemini 1.5:The Statistics of SubjectivityIn our Benchmarks 201 episode, Clémentine Fourrier from HuggingFace thought this design choice was one of shortcomings of arenas: they aren’t reproducible. You don’t know who ranked what and what exactly the outcome was at the time of ranking. That same person might rank the same pair of outputs differently on a different day, or might ask harder questions to better models compared to smaller ones, making it imbalanced. Another argument that people have brought up is confirmation bias. We know humans prefer longer responses and are swayed by formatting - Rob Mulla from Dreadnode had found some interesting data on this in May:The approach LMArena is taking is to use logistic regression to decompose human preferences into constituent factors. As Anastasios explains: "We can say what components of style contribute to human preference and how they contribute." By adding these style components as parameters, they can mathematically "suck out" their influence and isolate the core model capabilities.This extends beyond just style - they can control for any measurable factor: "What if I want to look at the cost adjusted performance? Parameter count? We can ex post facto measure that." This is one of the most interesting things about Arena: You have a data generation engine which you can clean and turn into leaderboards later. If you wanted to create a leaderboard for poetry writing, you could get existing data from Arena, normalize it by identifying these style components. Whether or not it’s possible to really understand WHAT bias the voters have, that’s a different question.Private EvalsOne of the most delicate challenges LMSYS faces is maintaining trust while collaborating with AI labs. The concern is that labs could game the system by testing multiple variants privately and only releasing the best performer. This was brought up when 4o-mini released and it ranked as the second best model on the leaderboard:But this fear misunderstands how Arena works. Unlike static benchmarks where selection bias is a major issue, Arena's live nature means any initial bias gets washed out by ongoing evaluation. As Anastasios explains: "In the long run, there's way more fresh data than there is data that was used to compare these five models." The other big question is WHAT model is actually being tested; as people often talk about on X / Discord, the same endpoint will randomly feel “nerfed” like it happened for “Claude European summer” and corresponding conspiracy theories:It’s hard to keep track of these performance changes in Arena as these changes (if real…?) are not observable.The Future of EvaluationThe team's latest work on RouteLLM points to an interesting future where evaluation becomes more granular and task-specific. But they maintain that even simple routing strategies can be powerful - like directing complex queries to larger models while handling simple tasks with smaller ones.Arena is now going to expand beyond text into multimodal evaluation and specialized domains like code execution and red teaming. But their core insight remains: the best way to evaluate intelligence isn't to simplify it into metrics, but to embrace its complexity and find rigorous ways to analyze it. To go after this vision, they are spinning out Arena from LMSys, which will stay as an academia-driven group at Berkeley.Full Video PodcastChapters* 00:00:00 - Introductions* 00:01:16 - Origin and development of Chatbot Arena* 00:05:41 - Static benchmarks vs. Arenas* 00:09:03 - Community building* 00:13:32 - Biases in human preference evaluation* 00:18:27 - Style Control and Model Categories* 00:26:06 - Impact of o1* 00:29:15 - Collaborating with AI labs* 00:34:51 - RouteLLM and router models* 00:38:09 - Future of LMSys / ArenaShow Notes* Anastasios Angelopoulos* Anastasios' NeurIPS Paper Conformal Risk Control* Wei-Lin Chiang* Chatbot Arena* LMSys* MTBench* ShareGPT dataset* Stanford's Alpaca project* LLMRouter* E2B* DreadnodeTranscriptAlessio [00:00:00]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, Partner and CTO in Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol.ai.Swyx [00:00:14]: Hey, and today we're very happy and excited to welcome Anastasios and Wei Lin from LMSys. Welcome guys.Wei Lin [00:00:21]: Hey, how's it going? Nice to see you.Anastasios [00:00:23]: Thanks for having us.Swyx [00:00:24]: Anastasios, I actually saw you, I think at last year's NeurIPS. You were presenting a paper, which I don't really super understand, but it was some theory paper about how your method was very dominating over other sort of search methods. I don't remember what it was, but I remember that you were a very confident speaker.Anastasios [00:00:40]: Oh, I totally remember you. Didn't ever connect that, but yes, that's definitely true. Yeah. Nice to see you again.Swyx [00:00:46]: Yeah. I was frantically looking for the name of your paper and I couldn't find it. Basically I had to cut it because I didn't understand it.Anastasios [00:00:51]: Is this conformal PID control or was this the online control?Wei Lin [00:00:55]: Blast from the past, man.Swyx [00:00:57]: Blast from the past. It's always interesting how NeurIPS and all these academic conferences are sort of six months behind what people are actually doing, but conformal risk control, I would recommend people check it out. I have the recording. I just never published it just because I was like, I don't understand this enough to explain it.Anastasios [00:01:14]: People won't be interested.Wei Lin [00:01:15]: It's all good.Swyx [00:01:16]: But ELO scores, ELO scores are very easy to understand. You guys are responsible for the biggest revolution in language model benchmarking in the last few years. Maybe you guys want to introduce yourselves and maybe tell a little bit of the brief history of LMSysWei Lin [00:01:32]: Hey, I'm Wei Lin. I'm a fifth year PhD student at UC Berkeley, working on Chatbot Arena these days, doing crowdsourcing AI benchmarking.Anastasios [00:01:43]: I'm Anastasios. I'm a sixth year PhD student here at Berkeley. I did most of my PhD on like theoretical statistics and sort of foundations of model evaluation and testing. And now I'm working 150% on this Chatbot Arena stuff. It's great.Alessio [00:02:00]: And what was the origin of it? How did you come up with the idea? How did you get people to buy in? And then maybe what were one or two of the pivotal moments early on that kind of made it the standard for these things?Wei Lin [00:02:12]: Yeah, yeah. Chatbot Arena project was started last year in April, May, around that. Before that, we were basically experimenting in a lab how to fine tune a chatbot open source based on the Llama 1 model that I released. At that time, Lama 1 was like a base model and people didn't really know how to fine tune it. So we were doing some explorations. We were inspired by Stanford's Alpaca project. So we basically, yeah, grow a data set from the internet, which is called ShareGPT data set, which is like a dialogue data set between user and chat GPT conversation. It turns out to be like pretty high quality data, dialogue data. So we fine tune on it and then we train it and release the model called V2. And people were very excited about it because it kind of like demonstrate open way model can reach this conversation capability similar to chat GPT. And then we basically release the model with and also build a demo website for the model. People were very excited about it. But during the development, the biggest challenge to us at the time was like, how do we even evaluate it? How do we even argue this model we trained is better than others? And then what's the gap between this open source model that other proprietary offering? At that time, it was like GPT-4 was just announced and it's like Cloud One. What's the difference between them? And then after that, like every week, there's a new model being fine tuned, released. So even until still now, right? And then we have that demo website for V2 now. And then we thought like, okay, maybe we can add a few more of the model as well, like API model as well. And then we quickly realized that
Comments
Top Podcasts
The Best New Comedy Podcast Right Now – June 2024The Best News Podcast Right Now – June 2024The Best New Business Podcast Right Now – June 2024The Best New Sports Podcast Right Now – June 2024The Best New True Crime Podcast Right Now – June 2024The Best New Joe Rogan Experience Podcast Right Now – June 20The Best New Dan Bongino Show Podcast Right Now – June 20The Best New Mark Levin Podcast – June 2024
United States