Chasing Real AGI: Inside ARC Prize 2025 with Chollet & Knoop
Description
In this fascinating episode, we dive deep into the race towards true AI intelligence, AGI benchmarks, test-time adaptation, and program synthesis with star AI researcher (and philosopher) Francois Chollet, creator of Keras and the ARC AGI benchmark, and Mike Knoop, co-founder of Zapier and now co-founder with Francois of both the ARC Prize and the research lab Ndea. With the launch of ARC Prize 2025 and ARC-AGI 2, they explain why existing LLMs fall short on true intelligence tests, how new models like O3 mark a step change in capabilities, and what it will really take to reach AGI.
We cover everything from the technical evolution of ARC 1 to ARC 2, the shift toward test-time reasoning, and the role of program synthesis as a foundation for more general intelligence. The conversation also explores the philosophical underpinnings of intelligence, the structure of the ARC Prize, and the motivation behind launching Ndea — a ew AGI research lab that aims to build a "factory for rapid scientific advancement." Whether you're deep in the AI research trenches or just fascinated by where this is all headed, this episode offers clarity and inspiration.
Ndea
Website - https://ndea.com
X/Twitter - https://x.com/ndea
ARC Prize
Website - https://arcprize.org
X/Twitter - https://x.com/arcprize
François Chollet
LinkedIn - https://www.linkedin.com/in/fchollet
X/Twitter - https://x.com/fchollet
Mike Knoop
X/Twitter - https://x.com/mikeknoop
FIRSTMARK
Website - https://firstmark.com
X/Twitter - https://twitter.com/FirstMarkCap
Matt Turck (Managing Director)
LinkedIn - https://www.linkedin.com/in/turck/
X/Twitter - https://twitter.com/mattturck
(00:00 ) Intro
(01:05 ) Introduction to ARC Prize 2025 and ARC-AGI 2
(02:07 ) What is ARC and how it differs from other AI benchmarks
(02:54 ) Why current models struggle with fluid intelligence
(03:52 ) Shift from static LLMs to test-time adaptation
(04:19 ) What ARC measures vs. traditional benchmarks
(07:52 ) Limitations of brute-force scaling in LLMs
(13:31 ) Defining intelligence: adaptation and efficiency
(16:19 ) How O3 achieved a massive leap in ARC performance
(20:35 ) Speculation on O3's architecture and test-time search
(22:48 ) Program synthesis: what it is and why it matters
(28:28 ) Combining LLMs with search and synthesis techniques
(34:57 ) The ARC Prize structure: efficiency track, private vs. public
(42:03 ) Open source as a requirement for progress
(44:59 ) What's new in ARC-AGI 2 and human benchmark testing
(48:14 ) Capabilities ARC-AGI 2 is designed to test
(49:21 ) When will ARC-AGI 2 be saturated? AGI timelines
(52:25 ) Founding of NDEA and why now
(54:19 ) Vision beyond AGI: a factory for scientific advancement
(56:40 ) What NDEA is building and why it's different from LLM labs
(58:32 ) Hiring and remote-first culture at NDEA
(59:52 ) Closing thoughts and the future of AI research




