DiscoverThursdAI - The top AI news from the past weekπŸ“† ThursdAI - Dec 5 - OpenAI o1 & o1 pro, Tencent HY-Video, FishSpeech 1.5, Google GENIE2, Weave in GA & more AI news
πŸ“† ThursdAI - Dec 5 - OpenAI o1 & o1 pro, Tencent HY-Video, FishSpeech 1.5, Google GENIE2, Weave in GA & more AI news

πŸ“† ThursdAI - Dec 5 - OpenAI o1 & o1 pro, Tencent HY-Video, FishSpeech 1.5, Google GENIE2, Weave in GA & more AI news

Update: 2024-12-06
Share

Description

Well well well, December is finally here, we're about to close out this year (and have just flew by the second anniversary of chatGPT πŸŽ‚) and it seems that all of the AI labs want to give us X-mas presents to play with over the holidays!

Look, I keep saying this, but weeks are getting crazier and crazier, this week we got the cheapest and the most expensive AI offerings all at once (the cheapest from Amazon and the most expensive from OpenAI), 2 new open weights models that beat commercial offerings, a diffusion model that predicts the weather and 2 world building models, oh and 2 decentralized fully open sourced LLMs were trained across the world LIVE and finished training. I said... crazy week!

And for W&B, this week started with Weave launching finally in GA πŸŽ‰, which I personally was looking forward for (read more below)!

TL;DR Highlights

* OpenAI O1 & Pro Tier: O1 is out of preview, now smarter, faster, multimodal, and integrated into ChatGPT. For heavy usage, ChatGPT Pro ($200/month) offers unlimited calls and O1 Pro Mode for harder reasoning tasks.

* Video & Audio Open Source Explosion: Tencent’s HYVideo outperforms Runway and Luma, bringing high-quality video generation to open source. Fishspeech 1.5 challenges top TTS providers, making near-human voice available for free research.

* Open Source Decentralization: Nous Research’s DiStRo (15B) and Prime Intellect’s INTELLECT-1 (10B) prove you can train giant LLMs across decentralized nodes globally. Performance is on par with centralized setups.

* Google’s Genie 2 & WorldLabs: Generating fully interactive 3D worlds from a single image, pushing boundaries in embodied AI and simulation. Google’s GenCast also sets a new standard in weather prediction, beating supercomputers in accuracy and speed.

* Amazon’s Nova FMs: Cheap, scalable LLMs with huge context and global language coverage. Perfect for cost-conscious enterprise tasks, though not top on performance.

* πŸŽ‰ Weave by W&B: Now in GA, it’s your dashboard and tool suite for building, monitoring, and scaling GenAI apps. Get Started with 1 line of code

OpenAI’s 12 Days of Shipping: O1 & ChatGPT Pro

The biggest splash this week came from OpenAI. They’re kicking off β€œ12 days of launches,” and Day 1 brought the long-awaited full version of o1. The main complaint about o1 for many people is how slow it was! Well, now it’s not only smarter but significantly faster (60% faster than preview!), and officially multimodal: it can see images and text together.

Better yet, OpenAI introduced a new ChatGPT Pro tier at $200/month. It offers unlimited usage of o1, advanced voice mode, and something called o1 pro mode β€” where o1 thinks even harder and longer about your hardest math, coding, or science problems. For power usersβ€”maybe data scientists, engineers, or hardcore codersβ€”this might be a no-brainer. For others, 200 bucks might be steep, but hey, someone’s gotta pay for those GPUs. Given that OpenAI recently confirmed that there are now 300 Million monthly active users on the platform, and many of my friends already upgraded, this is for sure going to boost the bottom line at OpenAI!

Quoting Sam Altman from the stream, β€œThis is for the power users who push the model to its limits every day.” For those who complained o1 took forever just to say β€œhi,” rejoice: trivial requests will now be answered quickly, while super-hard tasks get that legendary deep reasoning including a new progress bar and a notification when a task is complete. Friend of the pod Ray Fernando gave pro a prompt that took 7 minutes to think through!

I've tested the new o1 myself, and while I've gotten dangerously close to my 50 messages per week quota, I've gotten some incredible results already, and very fast as well. This ice-cubes question failed o1-preview and o1-mini and it took both of them significantly longer, and it took just 4 seconds for o1.

Open Source LLMs: Decentralization & Transparent Reasoning

Nous Research DiStRo & DeMo Optimizer

We’ve talked about decentralized training before, but the folks at Nous Research are making it a reality at scale. This week, Nous Research wrapped up the training of a new 15B-parameter LLMβ€”codename β€œPsyche”—using a fully decentralized approach called β€œNous DiStRo.” Picture a massive AI model trained not in a single data center, but across GPU nodes scattered around the globe. According to Alex Volkov (host of ThursdAI), β€œThis is crazy: they’re literally training a 15B param model using GPUs from multiple companies and individuals, and it’s working as well as centralized runs.”

The key to this success is β€œDeMo” (Decoupled Momentum Optimization), a paper co-authored by none other than Diederik Kingma (yes, the Kingma behind Adam optimizer and VAEs). DeMo drastically reduces communication overhead and still maintains stability and speed. The training loss curve they’ve shown looks just as good as a normal centralized run, proving that decentralized training isn’t just a pipe dream. The code and paper are open source, and soon we’ll have the fully trained Psyche model. It’s a huge step toward democratizing large-scale AIβ€”no more waiting around for Big Tech to drop their weights. Instead, we can all chip in and train together.

Prime Intellect INTELLECT-1 10B: Another Decentralized Triumph

But wait, there’s more! Prime Intellect also finished training their 10B model, INTELLECT-1, using a similar decentralized setup. INTELLECT-1 was trained with a custom framework that reduces inter-GPU communication by 400x. It’s essentially a global team effort, with nodes from all over the world contributing compute cycles.

The result? A model hitting performance similar to older Meta models like Llama 2β€”but fully decentralized.

Ruliad DeepThought 8B: Reasoning You Can Actually See

If that’s not enough, we’ve got yet another open-source reasoning model: Ruliad’s DeepThought 8B. This 8B parameter model (finetuned from LLaMA-3.1) from friends of the show FarEl, Alpin and Sentdex πŸ‘

Ruliad’s DeepThought attempts to match or exceed performance of much larger models in reasoning tasks (beating several 72B param models while being 8B itself) is very impressive.

Google is firing on all cylinders this week

Google didn't stay quiet this week as well, and while we all wait for the Gemini team to release the next Gemini after the myriad of very good experimental models recently, we've gotten some very amazing things this week.

Google’s PaliGemma 2 - finetunable SOTA VLM using Gemma

PaliGemma v2, a new vision-language family of models (3B, 10B and 33B) for 224px, 448px, 896px resolutions are a suite of base models, that include image segmentation and detection capabilities and are great at OCR which make them very versatile for fine-tuning on specific tasks.

They claim to achieve SOTA on chemical formula recognition, music score recognition, spatial reasoning, and chest X-ray report generation!

Google GenCast SOTA weather prediction with... diffusion!?

More impressively, Google DeepMind released GenCast, a diffusion-based model that beats the state-of-the-art ENS system in 97% of weather predictions. Did we say weather predictions? Yup.

Generative AI is now better at weather forecasting than dedicated physics based deterministic algorithms running on supercomputers. Gencast can predict 15 days in advance in just 8 minutes on a single TPU v5, instead of hours on a monstrous cluster. This is mind-blowing. As Yam said on the show, β€œPredicting the world is crazy hard” and now diffusion models handle it with ease.

W&B Weave: Observability, Evaluation and Guardrails now in GA

Speaking of building and monitoring GenAI apps, we at Weights & Biases (the sponsor of ThursdAI) announced that Weave is now GA. Weave is a developer tool for evaluating, visualizing, and debugging LLM calls in production. If you’re building GenAI appsβ€”like a coding agent or a tool that processes thousands of user requestsβ€”Weave helps you track costs, latency, and quality systematically.

We showcased two internal apps: Open UI (a website builder from a prompt) and Winston (an AI agent that checks emails, Slack, and more). Both rely on Weave to iterate, tune prompts, measure user feedback, and ensure stable performance. With O1 and other advanced models coming to APIs soon, tools like Weave will be crucial to keep those applications under control.

If you follow this newsletter and develop with LLMs, now is a great way to give Weave a try

Open Source Audio & Video: Challenging Proprietary Models

Tencent’s HY Video: Beating Runway & Luma in Open Source

Tencent came out swinging with their open-source model, HYVideo. It’s a video model that generates incredible realistic footage, camera cuts, and even audioβ€”yep, Foley and lip-synced character speech. Just a single model doing text-to-video, image-to-video, puppeteering, and more. It even outperforms closed-source giants like Runway Gen 3 and Luma 1.6 on over 1,500 prompts.

This is the kind of thing we dreamed about when we first heard of video diffusion models. Now it’s here, open-sourced, ready for tinkering. β€œIt’s near SORA-level,” as I mentioned, re

CommentsΒ 
In Channel
loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

πŸ“† ThursdAI - Dec 5 - OpenAI o1 & o1 pro, Tencent HY-Video, FishSpeech 1.5, Google GENIE2, Weave in GA & more AI news

πŸ“† ThursdAI - Dec 5 - OpenAI o1 & o1 pro, Tencent HY-Video, FishSpeech 1.5, Google GENIE2, Weave in GA & more AI news

Alex Volkov