DiscoverMachine Learning Street Talk (MLST)Subbarao Kambhampati - Do o1 models search?
Subbarao Kambhampati - Do o1 models search?

Subbarao Kambhampati - Do o1 models search?

Update: 2025-01-231
Share

Description

Join Prof. Subbarao Kambhampati and host Tim Scarfe for a deep dive into OpenAI's O1 model and the future of AI reasoning systems.




* How O1 likely uses reinforcement learning similar to AlphaGo, with hidden reasoning tokens that users pay for but never see


* The evolution from traditional Large Language Models to more sophisticated reasoning systems


* The concept of "fractal intelligence" in AI - where models work brilliantly sometimes but fail unpredictably


* Why O1's improved performance comes with substantial computational costs


* The ongoing debate between single-model approaches (OpenAI) vs hybrid systems (Google)


* The critical distinction between AI as an intelligence amplifier vs autonomous decision-maker




SPONSOR MESSAGES:


***


CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.


https://centml.ai/pricing/




Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events?




Goto https://tufalabs.ai/


***




TOC:


1. **O1 Architecture and Reasoning Foundations**


[00:00:00 ] 1.1 Fractal Intelligence and Reasoning Model Limitations


[00:04:28 ] 1.2 LLM Evolution: From Simple Prompting to Advanced Reasoning


[00:14:28 ] 1.3 O1's Architecture and AlphaGo-like Reasoning Approach


[00:23:18 ] 1.4 Empirical Evaluation of O1's Planning Capabilities




2. **Monte Carlo Methods and Model Deep-Dive**


[00:29:30 ] 2.1 Monte Carlo Methods and MARCO-O1 Implementation


[00:31:30 ] 2.2 Reasoning vs. Retrieval in LLM Systems


[00:40:40 ] 2.3 Fractal Intelligence Capabilities and Limitations


[00:45:59 ] 2.4 Mechanistic Interpretability of Model Behavior


[00:51:41 ] 2.5 O1 Response Patterns and Performance Analysis




3. **System Design and Real-World Applications**


[00:59:30 ] 3.1 Evolution from LLMs to Language Reasoning Models


[01:06:48 ] 3.2 Cost-Efficiency Analysis: LLMs vs O1


[01:11:28 ] 3.3 Autonomous vs Human-in-the-Loop Systems


[01:16:01 ] 3.4 Program Generation and Fine-Tuning Approaches


[01:26:08 ] 3.5 Hybrid Architecture Implementation Strategies




Transcript: https://www.dropbox.com/scl/fi/d0ef4ovnfxi0lknirkvft/Subbarao.pdf?rlkey=l3rp29gs4hkut7he8u04mm1df&dl=0




REFS:


[00:02:00 ] Monty Python (1975)


Witch trial scene: flawed logical reasoning.


https://www.youtube.com/watch?v=zrzMhU_4m-g




[00:04:00 ] Cade Metz (2024)


Microsoft–OpenAI partnership evolution and control dynamics.


https://www.nytimes.com/2024/10/17/technology/microsoft-openai-partnership-deal.html




[00:07:25 ] Kojima et al. (2022)


Zero-shot chain-of-thought prompting ('Let's think step by step').


https://arxiv.org/pdf/2205.11916




[00:12:50 ] DeepMind Research Team (2023)


Multi-bot game solving with external and internal planning.


https://deepmind.google/research/publications/139455/




[00:15:10 ] Silver et al. (2016)


AlphaGo's Monte Carlo Tree Search and Q-learning.


https://www.nature.com/articles/nature16961




[00:16:30 ] Kambhampati, S. et al. (2023)


Evaluates O1's planning in "Strawberry Fields" benchmarks.


https://arxiv.org/pdf/2410.02162




[00:29:30 ] Alibaba AIDC-AI Team (2023)


MARCO-O1: Chain-of-Thought + MCTS for improved reasoning.


https://arxiv.org/html/2411.14405




[00:31:30 ] Kambhampati, S. (2024)


Explores LLM "reasoning vs retrieval" debate.


https://arxiv.org/html/2403.04121v2




[00:37:35 ] Wei, J. et al. (2022)


Chain-of-thought prompting (introduces last-letter concatenation).


https://arxiv.org/pdf/2201.11903




[00:42:35 ] Barbero, F. et al. (2024)


Transformer attention and "information over-squashing."


https://arxiv.org/html/2406.04267v2




[00:46:05 ] Ruis, L. et al. (2023)


Influence functions to understand procedural knowledge in LLMs.


https://arxiv.org/html/2411.12580v1




(truncated - continued in shownotes/transcript doc)

Comments 
loading
00:00
00:00
1.0x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Subbarao Kambhampati - Do o1 models search?

Subbarao Kambhampati - Do o1 models search?

Machine Learning Street Talk (MLST)