๐ ThursdAI - Oct 3 - OpenAI RealTime API, ChatGPT Canvas & other DevDay news (how I met Sam Altman), Gemini 1.5 8B is basically free, BFL makes FLUX 1.1 6x faster, Rev breaks whisper records...
Description
Hey, it's Alex. Ok, so mind is officially blown. I was sure this week was going to be wild, but I didn't expect everyone else besides OpenAI to pile on, exactly on ThursdAI.
Coming back from Dev Day (number 2) and am still processing, and wanted to actually do a recap by humans, not just the NotebookLM one I posted during the keynote itself (which was awesome and scary in a "will AI replace me as a podcaster" kind of way), and was incredible to have Simon Willison who was sitting just behind me most of Dev Day, join me for the recap!
But then the news kept coming, OpenAI released Canvas, which is a whole new way of interacting with chatGPT, BFL released a new Flux version that's 8x faster, Rev released a Whisper killer ASR that does diarizaiton and Google released Gemini 1.5 Flash 8B, and said that with prompt caching (which OpenAI now also has, yay) this will cost a whopping 0.01 / Mtok. That's 1 cent per million tokens, for a multimodal model with 1 million context window. ๐คฏ
This whole week was crazy, as last ThursdAI after finishing the newsletter I went to meet tons of folks at the AI Tinkerers in Seattle, and did a little EvalForge demo (which you can see here) and wanted to share EvalForge with you as well, it's early but very promising so feedback and PRs are welcome!
WHAT A WEEK, TL;DR for those who want the links and let's dive in ๐
* OpenAI - Dev Day Recap (Alex, Simon Willison)
* Recap of Dev Day
* RealTime API launched
* Prompt Caching launched
* Model Distillation is the new finetune
* Finetuning 4o with images (Skalski guide)
* Fireside chat Q&A with Sam
* Open Source LLMs
* NVIDIA finally releases NVML (HF)
* This weeks Buzz
* Alex discussed his demo of EvalForge at the AI Tinkers event in Seattle in "This Week's Buzz". (Demo, EvalForge, AI TInkerers)
* Big Companies & APIs
* Google has released Gemini Flash 8B - 0.01 per million tokens cached (X, Blog)
* Voice & Audio
* Rev breaks SOTA on ASR with Rev ASR and Rev Diarize (Blog, Github, HF)
* AI Art & Diffusion & 3D
* BFL relases Flux1.1[pro] - 3x-6x faster than 1.0 and higher quality (was ๐ซ) - (Blog, Try it)
The day I met Sam Altman / Dev Day recap
Last Dev Day (my coverage here) was a "singular" day in AI for me, given it also had the "keep AI open source" with Nous Research and Grimes, and this Dev Day I was delighted to find out that the vibe was completely different, and focused less on bombastic announcements or models, but on practical dev focused things.
This meant that OpenAI cherry picked folks who actively develop with their tools, and they didn't invite traditional media, only folks like yours truly, @swyx from Latent space, Rowan from Rundown, Simon Willison and Dan Shipper, you know, newsletter and podcast folks who actually build!
This also allowed for many many OpenAI employees who work on the products and APIs we get to use, were there to receive feedback, help folks with prompting, and just generally interact with the devs, and build that community. I want to shoutout my friends Ilan (who was in the keynote as the strawberry salesman interacting with RealTime API agent), Will DePue from the SORA team, with whom we had an incredible conversation about ethics and legality of projects, Christine McLeavey who runs the Audio team, with whom I shared a video of my daughter crying when chatGPT didn't understand her, Katia, Kevin and Romain on the incredible DevEx/DevRel team and finally, my new buddy Jason who does infra, and was fighting bugs all day and only joined the pub after shipping RealTime to all of us.
I've collected all these folks in a convenient and super high signal X list here so definitely give that list a follow if you'd like to tap into their streams
For the actual announcements, I've already covered this in my Dev Day post here (which was payed subscribers only, but is now open to all) and Simon did an incredible summary on his Substack as well
The highlights were definitely the new RealTime API that let's developers build with Advanced Voice Mode, Prompt Caching that will happen automatically and reduce all your long context API calls by a whopping 50% and finetuning of models that they are rebranding into Distillation and adding new tools to make it easier (including Vision Finetuning for the first time!)
Meeting Sam Altman
While I didn't get a "media" pass or anything like this, and didn't really get to sit down with OpenAI execs (see Swyx on Latent Space for those conversations), I did have a chance to ask Sam multiple things.
First at the closing fireside chat between Sam and Kevin Weil (CPO at OpenAI), Kevin first asked Sam a bunch of questions, and then they gave out the microphones to folks, and I asked the only question that got Sam to smile
Sam and Kevin went on for a while, and that Q&A was actually very interesting, so much so, that I had to recruit my favorite Notebook LM podcast hosts, to go through it and give you an overview, so here's that Notebook LM, with the transcript of the whole Q&A (maybe i'll publish it as a standalone episode? LMK in the comments)
After the official day was over, there was a reception, at the same gorgeous Fort Mason location, with drinks and light food, and as you might imagine, this was great for networking.
But the real post dev day event was hosted by OpenAI devs at a bar, Palm House, which both Sam and Greg Brokman just came to and hung out with folks. I missed Sam last time and was very eager to go and ask him follow up questions this time, when I saw he was just chilling at that bar, talking to devs, as though he didn't "just" complete the largest funding round in VC history ($6.6B at $175B valuation) and went through a lot of drama/turmoil with the departure of a lot of senior leadership!
Sam was awesome to briefly chat with, tho as you might imagine, it was loud and tons of folks wanted selfies, but we did discuss how AI affects the real world, job replacement stuff were brought up, and how developers are using the OpenAI products.
What we learned, thanks to Sigil, is that o1 was named partly as a "reset" like the main blogpost claimed and partly as "alien of extraordinary ability" , which is the the official designation of the o1 visa, and that Sam came up with this joke himself.
Is anyone here smarter than o1? Do you think you still will by o2?
One of the highest impact questions was by Sam himself to the audience.
Who feels like they've spent a lot of time with O1, and they would say, like, I feel definitively smarter than that thing?
โ Sam Altman
When Sam asked this at first, a few hands hesitatingly went up. He then followed up with
Do you think you still will by O2? No one. No one taking the bet.One of the challenges that we face is like, we know how to go do this thing that we think will be like, at least probably smarter than all of us in like a broad array of tasks
This was a very palpable moment that folks looked around and realized, what OpenAI folks have probably internalized a long time ago, we're living in INSANE times, and even those of us at the frontier or research, AI use and development, don't necessarily understand or internalize how WILD the upcoming few months, years will be.
And then we all promptly forgot to have an existential crisis about it, and took our self driving Waymo's to meet Sam Altman at a bar ๐
This weeks Buzz from Weights & Biases
Hey so... after finishing ThursdAI last week I went to Seat