Subbarao Kambhampati - Do o1 models search?
Description
Join Prof. Subbarao Kambhampati and host Tim Scarfe for a deep dive into OpenAI's O1 model and the future of AI reasoning systems.
* How O1 likely uses reinforcement learning similar to AlphaGo, with hidden reasoning tokens that users pay for but never see
* The evolution from traditional Large Language Models to more sophisticated reasoning systems
* The concept of "fractal intelligence" in AI - where models work brilliantly sometimes but fail unpredictably
* Why O1's improved performance comes with substantial computational costs
* The ongoing debate between single-model approaches (OpenAI) vs hybrid systems (Google)
* The critical distinction between AI as an intelligence amplifier vs autonomous decision-maker
SPONSOR MESSAGES:
***
CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.
https://centml.ai/pricing/
Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events?
Goto https://tufalabs.ai/
***
TOC:
1. **O1 Architecture and Reasoning Foundations**
[00:00:00 ] 1.1 Fractal Intelligence and Reasoning Model Limitations
[00:04:28 ] 1.2 LLM Evolution: From Simple Prompting to Advanced Reasoning
[00:14:28 ] 1.3 O1's Architecture and AlphaGo-like Reasoning Approach
[00:23:18 ] 1.4 Empirical Evaluation of O1's Planning Capabilities
2. **Monte Carlo Methods and Model Deep-Dive**
[00:29:30 ] 2.1 Monte Carlo Methods and MARCO-O1 Implementation
[00:31:30 ] 2.2 Reasoning vs. Retrieval in LLM Systems
[00:40:40 ] 2.3 Fractal Intelligence Capabilities and Limitations
[00:45:59 ] 2.4 Mechanistic Interpretability of Model Behavior
[00:51:41 ] 2.5 O1 Response Patterns and Performance Analysis
3. **System Design and Real-World Applications**
[00:59:30 ] 3.1 Evolution from LLMs to Language Reasoning Models
[01:06:48 ] 3.2 Cost-Efficiency Analysis: LLMs vs O1
[01:11:28 ] 3.3 Autonomous vs Human-in-the-Loop Systems
[01:16:01 ] 3.4 Program Generation and Fine-Tuning Approaches
[01:26:08 ] 3.5 Hybrid Architecture Implementation Strategies
Transcript: https://www.dropbox.com/scl/fi/d0ef4ovnfxi0lknirkvft/Subbarao.pdf?rlkey=l3rp29gs4hkut7he8u04mm1df&dl=0
REFS:
[00:02:00 ] Monty Python (1975)
Witch trial scene: flawed logical reasoning.
https://www.youtube.com/watch?v=zrzMhU_4m-g
[00:04:00 ] Cade Metz (2024)
Microsoft–OpenAI partnership evolution and control dynamics.
https://www.nytimes.com/2024/10/17/technology/microsoft-openai-partnership-deal.html
[00:07:25 ] Kojima et al. (2022)
Zero-shot chain-of-thought prompting ('Let's think step by step').
https://arxiv.org/pdf/2205.11916
[00:12:50 ] DeepMind Research Team (2023)
Multi-bot game solving with external and internal planning.
https://deepmind.google/research/publications/139455/
[00:15:10 ] Silver et al. (2016)
AlphaGo's Monte Carlo Tree Search and Q-learning.
https://www.nature.com/articles/nature16961
[00:16:30 ] Kambhampati, S. et al. (2023)
Evaluates O1's planning in "Strawberry Fields" benchmarks.
https://arxiv.org/pdf/2410.02162
[00:29:30 ] Alibaba AIDC-AI Team (2023)
MARCO-O1: Chain-of-Thought + MCTS for improved reasoning.
https://arxiv.org/html/2411.14405
[00:31:30 ] Kambhampati, S. (2024)
Explores LLM "reasoning vs retrieval" debate.
https://arxiv.org/html/2403.04121v2
[00:37:35 ] Wei, J. et al. (2022)
Chain-of-thought prompting (introduces last-letter concatenation).
https://arxiv.org/pdf/2201.11903
[00:42:35 ] Barbero, F. et al. (2024)
Transformer attention and "information over-squashing."
https://arxiv.org/html/2406.04267v2
[00:46:05 ] Ruis, L. et al. (2023)
Influence functions to understand procedural knowledge in LLMs.
https://arxiv.org/html/2411.12580v1
(truncated - continued in shownotes/transcript doc)