📆 🎂 - ThursdAI #52 - Moshi Voice, Qwen2 finetunes, GraphRag deep dive and more AI news on this celebratory 1yr ThursdAI

Update: 2024-07-04

Description

Hey everyone! Happy 4th of July to everyone who celebrates! I celebrated today by having an intimate conversation with 600 of my closest X friends 😂 Joking aside, today is a celebratory episode, 52nd consecutive weekly ThursdAI show! I've been doing this as a podcast for a year now!

Which means, there are some of you, who've been subscribed for a year 😮 Thank you! Couldn't have done this without you. In the middle of my talk at AI Engineer (I still don't have the video!) I had to plug ThursdAI, and I asked the 300+ audience who is a listener of ThursdAI, and I saw a LOT of hands go up, which is honestly, still quite humbling. So again, thank you for tuning in, listening, subscribing, learning together with me and sharing with your friends!

This week, we covered a new (soon to be) open source voice model from KyutAI, a LOT of open source LLM, from InternLM, Cognitive Computations (Eric Hartford joined us), Arcee AI (Lukas Atkins joined as well) and we have a deep dive into GraphRAG with Emil Eifrem CEO of Neo4j (who shares why it was called Neo4j in the first place, and that he's a ThursdAI listener, whaaat? 🤯), this is definitely a conversation you don't want to miss, so tune in, and read a breakdown below:

TL;DR of all topics covered:

* Voice & Audio

* KyutAI releases Moshi - first ever 7B end to end voice capable model (Try it)

* Open Source LLMs

* Microsoft Updated Phi-3-mini - almost a new model

* InternLM 2.5 - best open source model under 12B on Hugging Face (HF, Github)

* Microsoft open sources GraphRAG (Announcement, Github, Paper)

* OpenAutoCoder-Agentless - SOTA on SWE Bench - 27.33% (Code, Paper)

* Arcee AI - Arcee Agent 7B - from Qwen2 - Function / Tool use finetune (HF)

* LMsys announces RouteLLM - a new Open Source LLM Router (Github)

* DeepSeek Chat got an significant upgrade (Announcement)

* Nomic GPT4all 3.0 - Local LLM (Download, Github)

* This weeks Buzz

* New free Prompts course from WandB in 4 days (pre sign up)

* Big CO LLMs + APIs

* Perplexity announces their new pro research mode (Announcement)

* X is rolling out "Grok Analysis" button and it's BAD in "fun mode" and then paused roll out

* Figma pauses the rollout of their AI text to design tool "Make Design" (X)

* Vision & Video

* Cognitive Computations drops DolphinVision-72b - VLM (HF)

* Chat with Emil Eifrem - CEO Neo4J about GraphRAG, AI Engineer

Voice & Audio

KyutAI Moshi - a 7B end to end voice model (Try It, See Announcement)

Seemingly out of nowhere, another french AI juggernaut decided to drop a major announcement, a company called KyutAI, backed by Eric Schmidt, call themselves "the first European private-initiative laboratory dedicated to open research in artificial intelligence" in a press release back in November of 2023, have quite a few rockstar co founders ex Deep Mind, Meta AI, and have Yann LeCun on their science committee.

This week they showed their first, and honestly quite mind-blowing release, called Moshi (Japanese for Hello, Moshi Moshi), which is an end to end voice and text model, similar to GPT-4o demos we've seen, except this one is 7B parameters, and can run on your mac!

While the utility of the model right now is not the greatest, not remotely close to anything resembling the amazing GPT-4o (which was demoed live to me and all of AI Engineer by Romain Huet) but Moshi shows very very impressive stats!

Built by a small team during only 6 months or so of work, they have trained an LLM (Helium 7B) an Audio Codec (Mimi) a Rust inference stack and a lot more, to give insane performance.

Model latency is 160ms and mic-to-speakers latency is 200ms, which is so fast it seems like it's too fast. The demo often responds faster than I'm able to finish my sentence, and it results in an uncanny, "reading my thoughts" type feeling.

The most important part is this though, a quote of KyutAI post after the announcement :

Developing Moshi required significant contributions to audio codecs, multimodal LLMs, multimodal instruction-tuning and much more. We believe the main impact of the project will be sharing all Moshi’s secrets with the upcoming paper and open-source of the model.

I'm really looking forward to how this tech can be applied to the incredible open source models we already have out there! Speaking to out LLMs is now officially here in the Open Source, way before we got GPT-4o and it's exciting!

Open Source LLMs

Microsoft stealth update Phi-3 Mini to make it almost a new model

So stealth in fact, that I didn't even have this update in my notes for the show, but thanks to incredible community (Bartowsky, Akshay Gautam) who made sure we don't miss this, because it's so huge.

The model used additional post-training data leading to substantial gains on instruction following and structure output. We also improve multi-turn conversation quality, explicitly support <|system|> tag, and significantly improve reasoning capability

Phi-3 June update is quite significant across the board, just look at some of these scores, 354.78% improvement in JSON structure output, 30% at GPQA

But also specifically for coding, a 33→93 jump in Java coding, 33→73 in Typescript, 27→ 85 in Python! These are just incredible numbers, and I definitely agree with Bartowski here, there's enough here to call this a whole new model rather than an "seasonal update"

Qwen-2 is the start of the show right now

Week in and week out, ThursdAI seems to be the watercooler for the best finetuners in the community to come, hang, share notes, and announce their models. A month after Qwen-2 was announced on ThursdAI stage live by friend of the pod and Qwen dev lead Junyang Lin, and a week after it re-took number 1 on the revamped open LLM leaderboard on HuggingFace, we now have great finetunes on top of Qwen-2.

Qwen-2 is the star of the show right now. Because there's no better model. This is like GPT 4 level. It's Open Weights GPT 4. We can do what we want with it, and it's so powerful, and it's multilingual, and it's everything, it's like the dream model. I love it

Eric Hartford - Cognitive Computations

We've had 2 models finetunes based on Qwen 2 and their authors on the show this week, first was Lukas Atkins from Arcee AI (company behind MergeKit), they released Arcee Agent, a 7B Qwen-2 finetune/merge specifically focusing on tool use and function calling.

We also had a chat with Eric Hartford from Cognitive Computations (which Lukas previously participated in) with the biggest open source VLM on top of Qwen-2, a 72B parameter Dolphin Vision (Trained by StableQuan, available on the HUB) ,and it's likely the biggest open source VLM that we've seen so far.

The most exciting part about it, is Fernando Neta's "SexDrugsRockandroll" dataset, which supposedly contains, well.. a lot of uncensored stuff, and it's perfectly able to discuss and analyze images with mature and controversial content.

InternLM 2.5 - SOTA open source under 12B with 1M context (HF, Github)

The folks at Shanghai AI release InternLM 2.5 7B, and a chat version along with a whopping 1M context window extension. These metrics are ridicu

Comments

Top Podcasts

The Best New Comedy Podcast Right Now – June 2024 The Best News Podcast Right Now – June 2024 The Best New Business Podcast Right Now – June 2024 The Best New Sports Podcast Right Now – June 2024 The Best New True Crime Podcast Right Now – June 2024 The Best New Joe Rogan Experience Podcast Right Now – June 20 The Best New Dan Bongino Show Podcast Right Now – June 20 The Best New Mark Levin Podcast – June 2024

In Channel

📆 ThursdAI - Jan 9th - NVIDIA's Tiny Supercomputer, Phi-4 is back, Kokoro TTS & Moondream gaze, ByteDance SOTA lip sync & more AI news

2025-01-1001:20:26

📆 ThursdAI - Jan 2 - is 25' the year of AI agents?

2025-01-0201:31:29

📆 ThursdAI - Dec 26 - OpenAI o3 & o3 mini, DeepSeek v3 658B beating Claude, Qwen Visual Reasoning, Hume OCTAVE & more AI news

2024-12-2701:35:32

🎄ThursdAI - Dec19 - o1 vs gemini reasoning, VEO vs SORA, and holiday season full of AI surprises

2024-12-2001:35:38

📆 ThursdAI - Dec 12 - unprecedented AI week - SORA, Gemini 2.0 Flash, Apple Intelligence, LLama 3.3, NeurIPS Drama & more AI news

2024-12-1301:39:04

📆 ThursdAI - Dec 5 - OpenAI o1 & o1 pro, Tencent HY-Video, FishSpeech 1.5, Google GENIE2, Weave in GA & more AI news

2024-12-0601:31:37

🦃 ThursdAI - Thanksgiving special 24' - Qwen Open Sources Reasoning, BlueSky hates AI, H controls the web & more AI news

2024-11-2801:46:16

📆 ThursdAI - Nov 21 - The fight for the LLM throne, OSS SOTA from AllenAI, Flux new tools, Deepseek R1 reasoning & more AI news

2024-11-2201:45:25

📆 ThursdAI - Nov 14 - Qwen 2.5 Coder, No Walls, Gemini 1114 👑 LLM, ChatGPT OS integrations & more AI news

2024-11-1501:48:42

📆 ThursdAI - Nov 7 - Video version, full o1 was given and taken away, Anthropic price hike-u, halloween 💀 recap & more AI news

2024-11-0801:38:22

📆 ThursdAI - Spooky Halloween edition with Video!

2024-11-0101:49:05

📅 ThursdAI - Oct 24 - Claude 3.5 controls your PC?! Talking AIs with 🦾, Multimodal Weave, Video Models mania + more AI news from this 🔥 week.

2024-10-2501:56:20

📆 ThursdAI - Oct 17 - Robots, Rockets, and Multi Modal Mania with open source voice cloning, OpenAI new voice API and more AI news

2024-10-1801:35:10

📆 ThursdAI - Oct 10 - Two Nobel Prizes in AI!? Meta Movie Gen (and sounds ) amazing, Pyramid Flow a 2B video model, 2 new VLMs & more AI news!

2024-10-1001:30:01

📆 ThursdAI - Oct 3 - OpenAI RealTime API, ChatGPT Canvas & other DevDay news (how I met Sam Altman), Gemini 1.5 8B is basically free, BFL makes FLUX 1.1 6x faster, Rev breaks whisper records...

2024-10-0401:45:14

OpenAI Dev Day 2024 keynote

2024-10-0105:55

📅 ThursdAI - Sep 26 - 🔥 Llama 3.2 multimodal & meta connect recap, new Gemini 002, Advanced Voice mode & more AI news

2024-09-2601:47:15

ThursdAI - Sep 19 - 👑 Qwen 2.5 new OSS king LLM, MSFT new MoE, Nous Research's Forge announcement, and Talking AIs in the open source!

2024-09-1901:56:06

🔥 📅 ThursdAI - Sep 12 - OpenAI's 🍓 is called 01 and is HERE, reflecting on Reflection 70B, Google's new auto podcasts & more AI news from last week

2024-09-1301:58:14

📅 ThursdAI - Sep 5 - 👑 Reflection 70B beats Claude 3.5, Anthropic Enterprise 500K context, 100% OSS MoE from AllenAI, 1000 agents world sim, Replit agent is the new Cursor? and more AI news

2024-09-0601:44:56

00:00

📆 🎂 - ThursdAI #52 - Moshi Voice, Qwen2 finetunes, GraphRag deep dive and more AI news on this celebratory 1yr ThursdAI

Alex Volkov, Emil Eifrem, and Nisten

#box-pro-ellipsis-173692715638237{-webkit-line-clamp:2;}📆 🎂 - ThursdAI #52 - Moshi Voice, Qwen2 finetunes, GraphRag deep dive and more AI news on this celebratory 1yr ThursdAI

📆 🎂 - ThursdAI #52 - Moshi Voice, Qwen2 finetunes, GraphRag deep dive and more AI news on this celebratory 1yr ThursdAI

Alex Volkov, Emil Eifrem, and Nisten

📆 🎂 - ThursdAI #52 - Moshi Voice, Qwen2 finetunes, GraphRag deep dive and more AI news on this celebratory 1yr ThursdAI