DiscoverThursdAI - The top AI news from the past weekπŸ“… ThursdAI - May 9 - AlphaFold 3, im-a-good-gpt2-chatbot, Open Devin SOTA on SWE-Bench, DeepSeek V2 super cheap + interview with OpenUI creator & more AI news
πŸ“… ThursdAI - May 9 - AlphaFold 3, im-a-good-gpt2-chatbot, Open Devin SOTA on SWE-Bench, DeepSeek V2 super cheap + interview with OpenUI creator & more AI news

πŸ“… ThursdAI - May 9 - AlphaFold 3, im-a-good-gpt2-chatbot, Open Devin SOTA on SWE-Bench, DeepSeek V2 super cheap + interview with OpenUI creator & more AI news

Update: 2024-05-10
Share

Description

Hey πŸ‘‹ (show notes and links a bit below)

This week has been a great AI week, however, it does feel like a bit "quiet before the storm" with Google I/O on Tuesday next week (which I'll be covering from the ground in Shoreline!) and rumors that OpenAI is not just going to let Google have all the spotlight!

Early this week, we got 2 new models on LMsys, im-a-good-gpt2-chatbot and im-also-a-good-gpt2-chatbot, and we've now confirmed that they are from OpenAI, and folks have been testing them with logic puzzles, role play and have been saying great things, so maybe that's what we'll get from OpenAI soon?

Also on the show today, we had a BUNCH of guests, and as you know, I love chatting with the folks who make the news, so we've been honored to host Xingyao Wang and Graham Neubig core maintainers of Open Devin (which just broke SOTA on Swe-Bench this week!) and then we had friends of the pod Tanishq Abraham and Parmita Mishra dive deep into AlphaFold 3 from Google (both are medical / bio experts).

Also this week, OpenUI from Chris Van Pelt (Co-founder & CIO at Weights & Biases) has been blowing up, taking #1 Github trending spot, and I had the pleasure to invite Chris and chat about it on the show!

Let's delve into this (yes, this is I, Alex the human, using Delve as a joke, don't get triggered πŸ˜‰)

TL;DR of all topics covered (trying something new, my Raw notes with all the links and bulletpoints are at the end of the newsletter)

* Open Source LLMs

* OpenDevin getting SOTA on Swe-Bench with 21% (X, Blog)

* DeepSeek V2 - 236B (21B Active) MoE (X, Try It)

* Weights & Biases OpenUI blows over 11K stars (X, Github, Try It)

* LLama-3 120B Chonker Merge from Maxime Labonne (X, HF)

* Alignment Lab open sources Buzz - 31M rows training dataset (X, HF)

* xLSTM - new transformer alternative (X, Paper, Critique)

* Benchmarks & Eval updates

* LLama-3 still in 6th place (LMsys analysis)

* Reka Core gets awesome 7th place and Qwen-Max breaks top 10 (X)

* No upsets in LLM leaderboard

* Big CO LLMs + APIs

* Google DeepMind announces AlphaFold-3 (Paper, Announcement)

* OpenAI publishes their Model Spec (Spec)

* OpenAI tests 2 models on LMsys (im-also-a-good-gpt2-chatbot & im-a-good-gpt2-chatbot)

* OpenAI joins Coalition for Content Provenance and Authenticity (Blog)

* Voice & Audio

* Udio adds in-painting - change parts of songs (X)

* 11Labs joins the AI Audio race (X)

* AI Art & Diffusion & 3D

* ByteDance PuLID - new high quality ID customization (Demo, Github, Paper)

* Tools & Hardware

* Went to the Museum with Rabbit R1 (My Thread)

* Co-Hosts and Guests

* Graham Neubig (@gneubig) & Xingyao Wang (@xingyaow_) from Open Devin

* Chris Van Pelt (@vanpelt) from Weights & Biases

* Nisten Tahiraj (@nisten) - Cohost

* Tanishq Abraham (@iScienceLuvr)

* Parmita Mishra (@prmshra)

* Wolfram Ravenwolf (@WolframRvnwlf)

* Ryan Carson (@ryancarson)

Open Source LLMs

Open Devin getting a whopping 21% on SWE-Bench (X, Blog)

Open Devin started as a tweet from our friend Junyang Lin (on the Qwen team at Alibaba) to get an open source alternative to the very popular Devin code agent from Cognition Lab (recently valued at $2B 🀯) and 8 weeks later, with tons of open source contributions, >100 contributors, they have almost 25K stars on Github, and now claim a State of the Art score on the very hard Swe-Bench Lite benchmark beating Devin and Swe-Agent (with 18%)

They have done so by using the CodeAct framework developed by Xingyao, and it's honestly incredible to see how an open source can catch up and beat a very well funded AI lab, within 8 weeks! Kudos to the OpenDevin folks for the organization, and amazing results!

DeepSeek v2 - huge MoE with 236B (21B active) parameters (X, Try It)

The folks at DeepSeek is releasing this huge MoE (the biggest we've seen in terms of experts) with 160 experts, and 6 experts activated per forward pass. A similar trend from the Snowflake team, just extended even longer. They also introduce a lot of technical details and optimizations to the KV cache.

With benchmark results getting close to GPT-4, Deepseek wants to take the crown in being the cheapest smartest model you can run, not only in open source btw, they are now offering this model at an incredible .28/1M tokens, that's 28 cents per 1M tokens!

The cheapest closest model in price was Haiku at $.25 and GPT3.5 at $0.5. This is quite an incredible deal for a model with 32K (128 in open source) context and these metrics.

Also notable is the training cost, they claim that it took them 1/5 the price of what Llama-3 cost Meta, which is also incredible. Unfortunately, running this model locally a nogo for most of us πŸ™‚

I would mention here that metrics are not everything, as this model fails quite humorously on my basic logic tests

LLama-3 120B chonker Merge from Maxime LaBonne (X, HF)

We're covered Merges before, and we've had the awesome Maxime Labonne talk to us at length about model merging on ThursdAI but I've been waiting for Llama-3 merges, and Maxime did NOT dissapoint!

A whopping 120B llama (Maxime added 50 layers to the 70B Llama3) is doing the rounds, and folks are claiming that Maxime achieved AGI πŸ˜‚ It's really funny, this model, is... something else.

Here just one example that Maxime shared, as it goes into an existential crisis about a very simple logic question. A question that Llama-3 answers ok with some help, but this... I've never seen this. Don't forget that merging has no additional training, it's mixing layers from the same model so... we still have no idea what Merging does to a model but... some brain damange definitely is occuring.

Oh and also it comes up with words!

ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Big CO LLMs + APIs

Open AI publishes Model Spec (X, Spec, Blog)

OpenAI publishes and invite

CommentsΒ 
In Channel
loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

πŸ“… ThursdAI - May 9 - AlphaFold 3, im-a-good-gpt2-chatbot, Open Devin SOTA on SWE-Bench, DeepSeek V2 super cheap + interview with OpenUI creator & more AI news

πŸ“… ThursdAI - May 9 - AlphaFold 3, im-a-good-gpt2-chatbot, Open Devin SOTA on SWE-Bench, DeepSeek V2 super cheap + interview with OpenUI creator & more AI news

Alex Volkov, Xingyao Wang, Nisten, Chris Van Pelt, and Graham Neubig