π ThursdAI - Dec 5 - OpenAI o1 & o1 pro, Tencent HY-Video, FishSpeech 1.5, Google GENIE2, Weave in GA & more AI news
Description
Well well well, December is finally here, we're about to close out this year (and have just flew by the second anniversary of chatGPT π) and it seems that all of the AI labs want to give us X-mas presents to play with over the holidays!
Look, I keep saying this, but weeks are getting crazier and crazier, this week we got the cheapest and the most expensive AI offerings all at once (the cheapest from Amazon and the most expensive from OpenAI), 2 new open weights models that beat commercial offerings, a diffusion model that predicts the weather and 2 world building models, oh and 2 decentralized fully open sourced LLMs were trained across the world LIVE and finished training. I said... crazy week!
And for W&B, this week started with Weave launching finally in GA π, which I personally was looking forward for (read more below)!
TL;DR Highlights
* OpenAI O1 & Pro Tier: O1 is out of preview, now smarter, faster, multimodal, and integrated into ChatGPT. For heavy usage, ChatGPT Pro ($200/month) offers unlimited calls and O1 Pro Mode for harder reasoning tasks.
* Video & Audio Open Source Explosion: Tencentβs HYVideo outperforms Runway and Luma, bringing high-quality video generation to open source. Fishspeech 1.5 challenges top TTS providers, making near-human voice available for free research.
* Open Source Decentralization: Nous Researchβs DiStRo (15B) and Prime Intellectβs INTELLECT-1 (10B) prove you can train giant LLMs across decentralized nodes globally. Performance is on par with centralized setups.
* Googleβs Genie 2 & WorldLabs: Generating fully interactive 3D worlds from a single image, pushing boundaries in embodied AI and simulation. Googleβs GenCast also sets a new standard in weather prediction, beating supercomputers in accuracy and speed.
* Amazonβs Nova FMs: Cheap, scalable LLMs with huge context and global language coverage. Perfect for cost-conscious enterprise tasks, though not top on performance.
* π Weave by W&B: Now in GA, itβs your dashboard and tool suite for building, monitoring, and scaling GenAI apps. Get Started with 1 line of code
OpenAIβs 12 Days of Shipping: O1 & ChatGPT Pro
The biggest splash this week came from OpenAI. Theyβre kicking off β12 days of launches,β and Day 1 brought the long-awaited full version of o1. The main complaint about o1 for many people is how slow it was! Well, now itβs not only smarter but significantly faster (60% faster than preview!), and officially multimodal: it can see images and text together.
Better yet, OpenAI introduced a new ChatGPT Pro tier at $200/month. It offers unlimited usage of o1, advanced voice mode, and something called o1 pro mode β where o1 thinks even harder and longer about your hardest math, coding, or science problems. For power usersβmaybe data scientists, engineers, or hardcore codersβthis might be a no-brainer. For others, 200 bucks might be steep, but hey, someoneβs gotta pay for those GPUs. Given that OpenAI recently confirmed that there are now 300 Million monthly active users on the platform, and many of my friends already upgraded, this is for sure going to boost the bottom line at OpenAI!
Quoting Sam Altman from the stream, βThis is for the power users who push the model to its limits every day.β For those who complained o1 took forever just to say βhi,β rejoice: trivial requests will now be answered quickly, while super-hard tasks get that legendary deep reasoning including a new progress bar and a notification when a task is complete. Friend of the pod Ray Fernando gave pro a prompt that took 7 minutes to think through!
I've tested the new o1 myself, and while I've gotten dangerously close to my 50 messages per week quota, I've gotten some incredible results already, and very fast as well. This ice-cubes question failed o1-preview and o1-mini and it took both of them significantly longer, and it took just 4 seconds for o1.
Open Source LLMs: Decentralization & Transparent Reasoning
Nous Research DiStRo & DeMo Optimizer
Weβve talked about decentralized training before, but the folks at Nous Research are making it a reality at scale. This week, Nous Research wrapped up the training of a new 15B-parameter LLMβcodename βPsycheββusing a fully decentralized approach called βNous DiStRo.β Picture a massive AI model trained not in a single data center, but across GPU nodes scattered around the globe. According to Alex Volkov (host of ThursdAI), βThis is crazy: theyβre literally training a 15B param model using GPUs from multiple companies and individuals, and itβs working as well as centralized runs.β
The key to this success is βDeMoβ (Decoupled Momentum Optimization), a paper co-authored by none other than Diederik Kingma (yes, the Kingma behind Adam optimizer and VAEs). DeMo drastically reduces communication overhead and still maintains stability and speed. The training loss curve theyβve shown looks just as good as a normal centralized run, proving that decentralized training isnβt just a pipe dream. The code and paper are open source, and soon weβll have the fully trained Psyche model. Itβs a huge step toward democratizing large-scale AIβno more waiting around for Big Tech to drop their weights. Instead, we can all chip in and train together.
Prime Intellect INTELLECT-1 10B: Another Decentralized Triumph
But wait, thereβs more! Prime Intellect also finished training their 10B model, INTELLECT-1, using a similar decentralized setup. INTELLECT-1 was trained with a custom framework that reduces inter-GPU communication by 400x. Itβs essentially a global team effort, with nodes from all over the world contributing compute cycles.
The result? A model hitting performance similar to older Meta models like Llama 2βbut fully decentralized.
Ruliad DeepThought 8B: Reasoning You Can Actually See
If thatβs not enough, weβve got yet another open-source reasoning model: Ruliadβs DeepThought 8B. This 8B parameter model (finetuned from LLaMA-3.1) from friends of the show FarEl, Alpin and Sentdex π
Ruliadβs DeepThought attempts to match or exceed performance of much larger models in reasoning tasks (beating several 72B param models while being 8B itself) is very impressive.
Google is firing on all cylinders this week
Google didn't stay quiet this week as well, and while we all wait for the Gemini team to release the next Gemini after the myriad of very good experimental models recently, we've gotten some very amazing things this week.
Googleβs PaliGemma 2 - finetunable SOTA VLM using Gemma
PaliGemma v2, a new vision-language family of models (3B, 10B and 33B) for 224px, 448px, 896px resolutions are a suite of base models, that include image segmentation and detection capabilities and are great at OCR which make them very versatile for fine-tuning on specific tasks.
They claim to achieve SOTA on chemical formula recognition, music score recognition, spatial reasoning, and chest X-ray report generation!
Google GenCast SOTA weather prediction with... diffusion!?
More impressively, Google DeepMind released GenCast, a diffusion-based model that beats the state-of-the-art ENS system in 97% of weather predictions. Did we say weather predictions? Yup.
Generative AI is now better at weather forecasting than dedicated physics based deterministic algorithms running on supercomputers. Gencast can predict 15 days in advance in just 8 minutes on a single TPU v5, instead of hours on a monstrous cluster. This is mind-blowing. As Yam said on the show, βPredicting the world is crazy hardβ and now diffusion models handle it with ease.
W&B Weave: Observability, Evaluation and Guardrails now in GA
Speaking of building and monitoring GenAI apps, we at Weights & Biases (the sponsor of ThursdAI) announced that Weave is now GA. Weave is a developer tool for evaluating, visualizing, and debugging LLM calls in production. If youβre building GenAI appsβlike a coding agent or a tool that processes thousands of user requestsβWeave helps you track costs, latency, and quality systematically.
We showcased two internal apps: Open UI (a website builder from a prompt) and Winston (an AI agent that checks emails, Slack, and more). Both rely on Weave to iterate, tune prompts, measure user feedback, and ensure stable performance. With O1 and other advanced models coming to APIs soon, tools like Weave will be crucial to keep those applications under control.
If you follow this newsletter and develop with LLMs, now is a great way to give Weave a try
Open Source Audio & Video: Challenging Proprietary Models
Tencentβs HY Video: Beating Runway & Luma in Open Source
Tencent came out swinging with their open-source model, HYVideo. Itβs a video model that generates incredible realistic footage, camera cuts, and even audioβyep, Foley and lip-synced character speech. Just a single model doing text-to-video, image-to-video, puppeteering, and more. It even outperforms closed-source giants like Runway Gen 3 and Luma 1.6 on over 1,500 prompts.
This is the kind of thing we dreamed about when we first heard of video diffusion models. Now itβs here, open-sourced, ready for tinkering. βItβs near SORA-level,β as I mentioned, re