We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode Subbarao Kambhampati - Do o1 models search?

Subbarao Kambhampati - Do o1 models search?

2025/1/23
logo of podcast Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

AI Deep Dive AI Chapters Transcript
People
S
Subbarao Kambhampati
T
Tim Scarfe
Topics
Subbarao Kambhampati: 我认为O1模型可能使用了类似AlphaGo的强化学习方法,它包含一个大型LLM和一个小型LLM。小型LLM负责生成提示增强,大型LLM则根据这些增强提示生成答案。这个过程类似于AlphaGo的蒙特卡洛树搜索,通过学习伪动作的Q值来提高推理能力。O1模型的训练过程包括LLM训练和一个非常昂贵的训练后阶段,在这个阶段,模型学习如何生成最佳的提示增强。在推理过程中,O1模型会生成大量的推理标记,用户需要为这些标记付费,这导致了高昂的成本。尽管O1模型在规划基准测试中表现优异,但它仍然存在局限性,例如无法解决大型问题,并且可能出错。 总的来说,O1模型是一个基于LLM的近似推理器,它结合了强化学习和蒙特卡洛树搜索方法,在一定程度上提高了LLM的推理能力,但其高昂的成本和潜在的错误仍然是需要关注的问题。 Tim Scarfe: 我对O1模型的推理机制以及它与传统LLM的区别很感兴趣。特别是关于O1模型是否真正进行推理,还是仅仅进行检索的问题。此外,我也关注O1模型的成本效益,以及它在实际应用中的局限性。

Deep Dive

Chapters
This chapter explores the concept of "fractal intelligence" in LLMs, where their performance is unpredictable. It discusses the limitations of current reasoning models and explores different approaches to enhance their reasoning capabilities, including inference time scaling and prompt augmentation techniques like chain of thought.
  • LLMs exhibit "fractal intelligence," meaning their performance is unpredictable.
  • Inference time scaling and prompt augmentation are explored as methods to improve reasoning.
  • Chain of thought prompting, with its variations, shows promise but faces limitations.

Shownotes Transcript

Join Prof. Subbarao Kambhampati and host Tim Scarfe for a deep dive into OpenAI's O1 model and the future of AI reasoning systems.

  • How O1 likely uses reinforcement learning similar to AlphaGo, with hidden reasoning tokens that users pay for but never see

  • The evolution from traditional Large Language Models to more sophisticated reasoning systems

  • The concept of "fractal intelligence" in AI - where models work brilliantly sometimes but fail unpredictably

  • Why O1's improved performance comes with substantial computational costs

  • The ongoing debate between single-model approaches (OpenAI) vs hybrid systems (Google)

  • The critical distinction between AI as an intelligence amplifier vs autonomous decision-maker

SPONSOR MESSAGES:


CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.

https://centml.ai/pricing/

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events?

Goto https://tufalabs.ai/


TOC:

  1. O1 Architecture and Reasoning Foundations

[00:00:00] 1.1 Fractal Intelligence and Reasoning Model Limitations

[00:04:28] 1.2 LLM Evolution: From Simple Prompting to Advanced Reasoning

[00:14:28] 1.3 O1's Architecture and AlphaGo-like Reasoning Approach

[00:23:18] 1.4 Empirical Evaluation of O1's Planning Capabilities

  1. Monte Carlo Methods and Model Deep-Dive

[00:29:30] 2.1 Monte Carlo Methods and MARCO-O1 Implementation

[00:31:30] 2.2 Reasoning vs. Retrieval in LLM Systems

[00:40:40] 2.3 Fractal Intelligence Capabilities and Limitations

[00:45:59] 2.4 Mechanistic Interpretability of Model Behavior

[00:51:41] 2.5 O1 Response Patterns and Performance Analysis

  1. System Design and Real-World Applications

[00:59:30] 3.1 Evolution from LLMs to Language Reasoning Models

[01:06:48] 3.2 Cost-Efficiency Analysis: LLMs vs O1

[01:11:28] 3.3 Autonomous vs Human-in-the-Loop Systems

[01:16:01] 3.4 Program Generation and Fine-Tuning Approaches

[01:26:08] 3.5 Hybrid Architecture Implementation Strategies

Transcript: https://www.dropbox.com/scl/fi/d0ef4ovnfxi0lknirkvft/Subbarao.pdf?rlkey=l3rp29gs4hkut7he8u04mm1df&dl=0

REFS:

[00:02:00] Monty Python (1975)

Witch trial scene: flawed logical reasoning.

https://www.youtube.com/watch?v=zrzMhU_4m-g

[00:04:00] Cade Metz (2024)

Microsoft–OpenAI partnership evolution and control dynamics.

https://www.nytimes.com/2024/10/17/technology/microsoft-openai-partnership-deal.html

[00:07:25] Kojima et al. (2022)

Zero-shot chain-of-thought prompting ('Let's think step by step').

https://arxiv.org/pdf/2205.11916

[00:12:50] DeepMind Research Team (2023)

Multi-bot game solving with external and internal planning.

https://deepmind.google/research/publications/139455/

[00:15:10] Silver et al. (2016)

AlphaGo's Monte Carlo Tree Search and Q-learning.

https://www.nature.com/articles/nature16961

[00:16:30] Kambhampati, S. et al. (2023)

Evaluates O1's planning in "Strawberry Fields" benchmarks.

https://arxiv.org/pdf/2410.02162

[00:29:30] Alibaba AIDC-AI Team (2023)

MARCO-O1: Chain-of-Thought + MCTS for improved reasoning.

https://arxiv.org/html/2411.14405

[00:31:30] Kambhampati, S. (2024)

Explores LLM "reasoning vs retrieval" debate.

https://arxiv.org/html/2403.04121v2

[00:37:35] Wei, J. et al. (2022)

Chain-of-thought prompting (introduces last-letter concatenation).

https://arxiv.org/pdf/2201.11903

[00:42:35] Barbero, F. et al. (2024)

Transformer attention and "information over-squashing."

https://arxiv.org/html/2406.04267v2

[00:46:05] Ruis, L. et al. (2023)

Influence functions to understand procedural knowledge in LLMs.

https://arxiv.org/html/2411.12580v1

(truncated - continued in shownotes/transcript doc)