DiscoverThursdAI - The top AI news from the past week๐Ÿ“… ThursdAI - Sep 5 - ๐Ÿ‘‘ Reflection 70B beats Claude 3.5, Anthropic Enterprise 500K context, 100% OSS MoE from AllenAI, 1000 agents world sim, Replit agent is the new Cursor? and more AI news
๐Ÿ“… ThursdAI - Sep 5 - ๐Ÿ‘‘ Reflection 70B beats Claude 3.5, Anthropic Enterprise 500K context, 100% OSS MoE from AllenAI, 1000 agents world sim, Replit agent is the new Cursor? and more AI news

๐Ÿ“… ThursdAI - Sep 5 - ๐Ÿ‘‘ Reflection 70B beats Claude 3.5, Anthropic Enterprise 500K context, 100% OSS MoE from AllenAI, 1000 agents world sim, Replit agent is the new Cursor? and more AI news

Update: 2024-09-06
Share

Description

Welcome back everyone, can you believe it's another ThursdAI already? And can you believe me when I tell you that friends of the pod Matt Shumer & Sahil form Glaive.ai just dropped a LLama 3.1 70B finetune that you can download that will outperform Claude Sonnet 3.5 while running locally on your machine?

Today was a VERY heavy Open Source focused show, we had a great chat w/ Niklas, the leading author of OLMoE, a new and 100% open source MoE from Allen AI, a chat with Eugene (pico_creator) about RWKV being deployed to over 1.5 billion devices with Windows updates and a lot more.

In the realm of the big companies, Elon shook the world of AI by turning on the biggest training cluster called Colossus (100K H100 GPUs) which was scaled in 122 days ๐Ÿ˜ฎ and Anthropic announced that they have 500K context window Claude that's only reserved if you're an enterprise customer, while OpenAI is floating an idea of a $2000/mo subscription for Orion, their next version of a 100x better chatGPT?!

TL;DR

* Open Source LLMs

* Matt Shumer / Glaive - Reflection-LLama 70B beats Claude 3.5 (X, HF)

* Allen AI - OLMoE - first "good" MoE 100% OpenSource (X, Blog, Paper, WandB)

* RWKV.cpp is deployed with Windows to 1.5 Billion devices

* MMMU pro - more robust multi disipline multimodal understanding bench (proj)

* 01AI - Yi-Coder 1.5B and 9B (X, Blog, HF)

* Big CO LLMs + APIs

* Replit launches Agent in beta - from coding to production (X, Try It)

* Ilya SSI announces 1B round from everyone (Post)

* Cohere updates Command-R and Command R+ on API (Blog)

* Claude Enterprise with 500K context window (Blog)

* Claude invisibly adds instructions (even via the API?) (X)

* Google got structured output finally (Docs)

* Amazon to include Claude in Alexa starting this October (Blog)

* X ai scaled Colossus to 100K H100 GPU goes online (X)

* DeepMind - AlphaProteo new paper (Blog, Paper, Video)

* This weeks Buzz

* Hackathon did we mention? We're going to have Eugene and Greg as Judges!

* AI Art & Diffusion & 3D

* ByteDance - LoopyAvatar - Audio Driven portait avatars (Page)

Open Source LLMs

Reflection Llama-3.1 70B - new ๐Ÿ‘‘ open source LLM from Matt Shumer / GlaiveAI

This model is BANANAs folks, this is a LLama 70b finetune, that was trained with a new way that Matt came up with, that bakes CoT and Reflection into the model via Finetune, which results in model outputting its thinking as though you'd prompt it in a certain way.

This causes the model to say something, and then check itself, and then reflect on the check and then finally give you a much better answer. Now you may be thinking, we could do this before, RefleXion (arxiv.org/2303.11366) came out a year ago, so what's new?

What's new is, this is now happening inside the models head, you don't have to reprompt, you don't even have to know about these techniques! So what you see above, is just colored differently, but all of it, is output by the model without extra prompting by the user or extra tricks in system prompt. the model thinks, plans, does chain of thought, then reviews and reflects, and then gives an answer!

And the results are quite incredible for a 70B model ๐Ÿ‘‡

Looking at these evals, this is a 70B model that beats GPT-4o, Claude 3.5 on Instruction Following (IFEval), MATH, GSM8K with 99.2% ๐Ÿ˜ฎ and gets very close to Claude on GPQA and HumanEval!

(Note that these comparisons are a bit of a apples to ... different types of apples. If you apply CoT and reflection to the Claude 3.5 model, they may in fact perform better on the above, as this won't be counted 0-shot anymore. But given that this new model is effectively spitting out those reflection tokens, I'm ok with this comparison)

This is just the 70B, next week the folks are planning to drop the 405B finetune with the technical report, so stay tuned for that!

Kudos on this work, go give Matt Shumer and Glaive AI a follow!

Allen AI OLMoE - tiny "good" MoE that's 100% open source, weights, code, logs

We've previously covered OLMO from Allen Institute, and back then it was obvious how much commitment they have to open source, and this week they continued on this path with the release of OLMoE, an Mixture of Experts 7B parameter model (1B active parameters), trained from scratch on 5T tokens, which was completely open sourced.

This model punches above its weights on the best performance/cost ratio chart for MoEs and definitely highest on the charts of releasing everything.

By everything here, we mean... everything, not only the final weights file; they released 255 checkpoints (every 5000 steps), the training code (Github) and even (and maybe the best part) the Weights & Biases logs!

It was a pleasure to host the leading author of the OLMoE paper, Niklas Muennighoff on the show today, so definitely give this segment a listen, he's a great guest and I learned a lot!

Big Companies LLMs + API

Anthropic has 500K context window Claude but only for Enterprise?

Well, this sucks (unless you work for Midjourney, Airtable or Deloitte). Apparently Anthropic has been sitting on Claude that can extend to half a million tokens in the context window, and decided to keep it to themselves and a few trial enterprises, and package it as an Enterprise offering.

This offering now includes, beyond just the context window, also a native Github integration, and a few key enterprise features like access logs, provisioning and SCIM and all kinds of "procurement and CISO required" stuff enterprises look for.

To be clear, this is a great move for Anthropic, and this isn't an API tier, this is for their front end offering, including the indredible artifacts tool, so that companies can buy their employees access to Claude.ai and have them be way more productive coding (hence the Github integration) or summarizing (very very) long documents, building mockups and one off apps etc'

Anthropic is also in the news this week, because Amazon announced that it'll use Claude as the backbone for the smart (or "remarkable" as they call it) Alexa brains coming up in October, which, again, incredible for Anthropic distribution, as there are maybe 100M Alexa users in the world or so.

Prompt injecting must stop!

And lastly, there have been mounting evidence, including our own Wolfram Ravenwolf that confirmed it, that Anthropic is prompt injecting additional context into your own prompts, in the UI but also via the API! This is awful practice and if anyone from there reads this newsletter, please stop or at least acknowledge. Claude apparently just... thinks that it's somethi

Commentsย 
In Channel
loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

๐Ÿ“… ThursdAI - Sep 5 - ๐Ÿ‘‘ Reflection 70B beats Claude 3.5, Anthropic Enterprise 500K context, 100% OSS MoE from AllenAI, 1000 agents world sim, Replit agent is the new Cursor? and more AI news

๐Ÿ“… ThursdAI - Sep 5 - ๐Ÿ‘‘ Reflection 70B beats Claude 3.5, Anthropic Enterprise 500K context, 100% OSS MoE from AllenAI, 1000 agents world sim, Replit agent is the new Cursor? and more AI news

Alex Volkov