๐
ThursdAI - Aug 29 - AI Plays DOOM, Cerebras breaks inference records, Google gives new Geminis, OSS vision SOTA & 100M context windows!?
Description
Hey, for the least time during summer of 2024, welcome to yet another edition of ThursdAI, also happy skynet self-awareness day for those who keep track :)
This week, Cerebras broke the world record for fastest LLama 3.1 70B/8B inference (and came on the show to talk about it) Google updated 3 new Geminis, Anthropic artifacts for all, 100M context windows are possible, and Qwen beats SOTA on vision models + much more!
As always, this weeks newsletter is brought to you by Weights & Biases, did I mention we're doing a hackathon in SF in September 21/22 and that we have an upcoming free RAG course w/ Cohere & Weaviate?
TL;DR
* Open Source LLMs
* Nous DisTrO - Distributed Training (X , Report)
* NousResearch/ hermes-function-calling-v1 open sourced - (X, HF)
* LinkedIN Liger-Kernel - OneLine to make Training 20% faster & 60% more memory Efficient (Github)
* Cartesia - Rene 1.3B LLM SSM + Edge Apache 2 acceleration (X, Blog)
* Big CO LLMs + APIs
* Cerebras launches the fastest AI inference - 447t/s LLama 3.1 70B (X, Blog, Try It)
* Google - Gemini 1.5 Flash 8B & new Gemini 1.5 Pro/Flash (X, Try it)
* Google adds Gems & Imagen to Gemini paid tier
* Anthropic artifacts available to all users + on mobile (Blog, Try it)
* Anthropic publishes their system prompts with model releases (release notes)
* OpenAI has project Strawberry coming this fall (via The information)
* This weeks Buzz
* WandB Hackathon hackathon hackathon (Register, Join)
* Also, we have a new RAG course w/ Cohere and Weaviate (RAG Course)
* Vision & Video
* Zhipu AI CogVideoX - 5B Video Model w/ Less 10GB of VRAM (X, HF, Try it)
* Qwen-2 VL 72B,7B,2B - new SOTA vision models from QWEN (X, Blog, HF)
* AI Art & Diffusion & 3D
* GameNgen - completely generated (not rendered) DOOM with SD1.4 (project)
* FAL new LORA trainer for FLUX - trains under 5 minutes (Trainer, Coupon for ThursdAI)
* Tools & Others
* SimpleBench from AI Explained - closely matches human experience (simple-bench.com)
ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
Open Source
Let's be honest - ThursdAI is a love letter to the open-source AI community, and this week was packed with reasons to celebrate.
Nous Research DiStRO + Function Calling V1
Nous Research was on fire this week (aren't they always?) and they kicked off the week with the release of DiStRO, which is a breakthrough in distributed training. You see, while LLM training requires a lot of hardware, it also requires a lot of network bandwidth between the different GPUs, even within the same data center.
Proprietary networking solutions like Nvidia NVLink, and more open standards like Ethernet work well within the same datacenter, but training across different GPU clouds has been unimaginable until now.
Enter DiStRo, a new decentralized training by the mad geniuses at Nous Research, in which they reduced the required bandwidth to train a 1.2B param model from 74.4GB to just 86MB (857x)!
This can have massive implications for training across compute clusters, doing shared training runs, optimizing costs and efficiency and democratizing LLM training access! So don't sell your old GPUs just yet, someone may just come up with a folding@home but for training the largest open source LLM, and it may just be Nous!
Nous Research also released their function-calling-v1 dataset (HF) that was used to train Hermes-2, and we had InterstellarNinja who authored that dataset, join the show and chat about it. This is an incredible unlock for the open source community, as function calling become a de-facto standard now. Shout out to the Glaive team as well for their pioneering work that paved the way!
LinkedIn's Liger Kernel: Unleashing the Need for Speed (with One Line of Code)
What if I told you, that whatever software you develop, you can add 1 line of code, and it'll run 20% faster, and require 60% less memory?
This is basically what Linkedin researches released this week with Liger Kernel, yes you read that right, Linkedin, as in the website you career related posts on!
"If you're doing any form of finetuning, using this is an instant win"Wing Lian - Axolotl
This absolutely bonkers improvement in training LLMs, now works smoothly with Flash Attention, PyTorch FSDP and DeepSpeed. If you want to read more about the implementation of the triton kernels, you can see a deep dive here, I just wanted to bring this to your attention, even if you're not technical, because efficiency jumps like these are happening all the time. We are used to seeing them in capabilities / intelligence, but they are also happening on the algorithmic/training/hardware side, and it's incredible to see!
Huge shoutout to Byron and team at Linkedin for this unlock, check out their Github if you want to get involved!
Qwen-2 VL - SOTA image and video understanding + open weights mini VLM
You may already know that we love the folks at Qwen here on ThursdAI, not only because Junyang Lin is a frequeny co-host and we get to hear about their releases as soon as they come out (they seem to be releasing them on thursdays around the time of the live show, I wonder why!)
But also because, they are committed to open source, and have released 2 models 7B and 2B with complete Apache 2 license!
First of all, their Qwen-2 VL 72B model, is now SOTA at many benchmarks, beating GPT-4, Claude 3.5 and other much bigger models. This is insane. I literally had to pause Junyang and repeat what he said, this is a 72B param model, that beats GPT-4o on document understanding, on math, on general visual Q&A.
Additional Capabilities & Smaller models
They have added new capabilities in these models, like being able to handle arbitrary resolutions, but the one I'm most excited about is the video understanding. These models can now understand up to 20 minutes of video sequences, and it's not just "split the video to 10 frames and do image caption", no, these models understand video progression and if I understand correctly how they do it, it's quite genius.
They the video embed time progression into the model using a new technique called M-RoPE, which turns the time progression into rotary positional embeddings.
Now, the 72B model is currently available via API, but we do get 2 new small models with Apache 2 license and they are NOT too shabby either!
7B parameters (HF) and 2B Qwen-2 VL (HF) are small enough to run