π
AI21 Jamba 1.5, DIY Meme Faces, 8yo codes with AI and a Doomsday LLM Device?!
Description
Hey there, Alex here with an end of summer edition of our show, which did not disappoint. Today is the official anniversary of stable diffusion 1.4 can you believe it?
It's the second week in the row that we have an exclusive LLM launch on the show (after Emozilla announced Hermes 3 on last week's show), and spoiler alert, we may have something cooking for next week as well!
This edition of ThursdAI is brought to you by W&B Weave, our LLM observability toolkit, letting you evaluate LLMs for your own use-case easily
Also this week, we've covered both ends of AI progress, doomerist CEO saying "Fck Gen AI" vs an 8yo coder and I continued to geek out on putting myself into memes (I promised I'll stop... at some point) so buckle up, let's take a look at another crazy week:
TL;DR
* Open Source LLMs
* AI21 releases Jamba1.5 Large / Mini hybrid Mamba MoE (X, Blog, HF)
* Microsoft Phi 3.5 - 3 new models including MoE (X, HF)
* BFCL 2 - Berkley Function Calling Leaderboard V2 (X, Blog, Leaderboard)
* NVIDIA - Mistral Nemo Minitron 8B - Distilled / Pruned from 12B (HF)
* Cohere paper proves - code improves intelligence (X, Paper)
* MOHAWK - transformer β Mamba distillation method (X, Paper, Blog)
* AI Art & Diffusion & 3D
* Ideogram launches v2 - new img diffusion king π + API (X, Blog, Try it)
* Midjourney is now on web + free tier (try it finally)
* Flux keeps getting better, cheaper, faster + adoption from OSS (X, X, X)
* Procreate hates generative AI (X)
* Big CO LLMs + APIs
* Grok 2 full is finally available on X - performs well on real time queries (X)
* OpenAI adds GPT-4o Finetuning (blog)
* Google API updates - 1000 pages PDFs + LOTS of free tokens (X)
* This weeks Buzz
* Weights & Biases Judgement Day SF Hackathon in September 21-22 (Sign up to hack)
* Video
* Hotshot - new video model - trained by 4 guys (try it, technical deep dive)
* Luma Dream Machine 1.5 (X, Try it)
* Tools & Others
* LMStudio 0.0.3 update - local RAG, structured outputs with any model & more (X)
* Vercel - Vo now has chat (X)
* Ark - a completely offline device - offline LLM + worlds maps (X)
* Ricky's Daughter coding with cursor video is a must watch (video)
The Best of the Best: Open Source Wins with Jamba, Phi 3.5, and Surprise Function Calling Heroes
We kick things off this week by focusing on what we love the most on ThursdAI, open-source models! We had a ton of incredible releases this week, starting off with something we were super lucky to have live, the official announcement of AI21's latest LLM: Jamba.
AI21 Officially Announces Jamba 1.5 Large/Mini β The Powerhouse Architecture Combines Transformer and Mamba
While we've covered Jamba release on the show back in April, Jamba 1.5 is an updated powerhouse. It's 2 models, Large and Mini, both MoE and both are still hybrid architecture of Transformers + Mamba that try to get both worlds.
Itay Dalmedigos, technical lead at AI21, joined us on the ThursdAI stage for an exclusive first look, giving us the full rundown on this developer-ready model with an awesome 256K context window, but it's not just the size β itβs about using that size effectively.
AI21 measured the effective context use of their model on the new RULER benchmark released by NVIDIA, an iteration of the needle in the haystack and showed that their models have full utilization of context, as opposed to many other models.
βAs you mentioned, weβre able to pack many, many tokens on a single GPU. Uh, this is mostly due to the fact that we are able to quantize most of our parameters", Itay explained, diving into their secret sauce, ExpertsInt8, a novel quantization technique specifically designed for MoE models.
Oh, and did we mention Jamba is multilingual (eight languages and counting), natively supports structured JSON, function calling, document digestion⦠basically everything developers dream of. They even chucked in citation generation, as it's long context can contain full documents, your RAG app may not even need to chunk anything, and the citation can cite full documents!
Berkeley Function Calling Leaderboard V2: Updated + Live (link)
Ever wondered how to measure the real-world magic of those models boasting "I can call functions! I can do tool use! Look how cool I am!" π? Enter the Berkeley Function Calling Leaderboard (BFCL) 2, a battleground where models clash to prove their function calling prowess.
Version 2 just dropped, and this ain't your average benchmark, folks. It's armed with a "Live Dataset" - a dynamic, user-contributed treasure trove of real-world queries, rare function documentations, and specialized use-cases spanning multiple languages. Translation: NO more biased, contaminated datasets. BFCL 2 is as close to the real world as it gets.
So, whoβs sitting on the Function Calling throne this week? Our old friend Claude 3.5 Sonnet, with an impressive score of 73.61. But breathing down its neck is GPT 4-0613 (the OG Function Calling master) with 73.5. That's right, the one released a year ago, the first one with function calling, in fact the first LLM with function calling as a concept IIRC!
Now, prepare for the REAL plot twist. The top-performing open-source model isnβt some big name, resource-heavy behemoth. Itβs a tiny little underdog called Functionary Medium 3.1, a finetuned version of Llama 3.1 that blew everyone away. It even outscored both versions of Claude 3 Opus AND GPT 4 - leaving folks scrambling to figure out WHO created this masterpiece.
βIβve never heard of this model. It's MIT licensed from an organization called MeetKai. Have you guys heard about Functionary Medium?β I asked, echoing the collective bafflement in the space. Yep, turns out thereβs gold hidden in the vast landscape of open source models, just waiting to be unearthed βοΈ.
Microsoft updates Phi 3.5 - 3 new models including an MoE + MIT license
3 new Phi's dropped this week, including an MoE one, and a new revamped vision one. They look very decent on benchmark yet again, with the mini version (3.8B) seemin