We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

How Do AI Models Actually Think? - Laura Ruis

2025/1/20

Machine Learning Street Talk (MLST)

AI Deep Dive Transcript

People

Laura Ruis

Topics

我研究了大型语言模型在推理任务中的表现，发现其性能提升不仅源于规模的扩大（能够记住更多类似内容），更重要的是模型学习到了一种更有趣的、与数据量或参数量相关的质变。我的研究使用了影响函数来分析预训练数据对模型推理的影响。结果表明，事实检索任务的影响函数得分集中，而推理任务的影响函数得分分散，这说明推理依赖于更多更广泛的数据。此外，在推理任务中，相同类型的文档对不同问题的答案影响相似，这支持了程序性知识的观点，即模型并非简单地检索信息，而是综合运用多种知识来解决问题。代码在大型语言模型的推理过程中具有显著影响，这可能是因为代码中包含了大量关于程序和步骤的描述性信息。代码的影响既有正面也有负面，目前尚不清楚其具体机制。我的研究结果表明，大型语言模型能够从代码中学习步骤式推理过程，这为数据合成提供了新的思路。大型语言模型的推理能力并非始终如一，而是存在多种模式，有时是基于检索，有时是基于推理。模型的推理能力受限于其自身特性以及输入数据的限制，但这并不意味着它完全缺乏推理能力。数学推理能力可能可以迁移到其他类型的推理，但数学推理只是推理能力的一个方面，其他类型的推理，如归纳推理，则更难评估。

Deep Dive

Shownotes Transcript

Laura Ruis, a PhD student at University College London and researcher at Cohere, explains her groundbreaking research into how large language models (LLMs) perform reasoning tasks, the fundamental mechanisms underlying LLM reasoning capabilities, and whether these models primarily rely on retrieval or develop procedural knowledge.

SPONSOR MESSAGES:

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.

https://centml.ai/pricing/

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events?

Goto https://tufalabs.ai/

TOC

LLM Foundations and Learning

1.1 Scale and Learning in Language Models [00:00:00]

1.2 Procedural Knowledge vs Fact Retrieval [00:03:40]

1.3 Influence Functions and Model Analysis [00:07:40]

1.4 Role of Code in LLM Reasoning [00:11:10]

1.5 Semantic Understanding and Physical Grounding [00:19:30]

Reasoning Architectures and Measurement

2.1 Measuring Understanding and Reasoning in Language Models [00:23:10]

2.2 Formal vs Approximate Reasoning and Model Creativity [00:26:40]

2.3 Symbolic vs Subsymbolic Computation Debate [00:34:10]

2.4 Neural Network Architectures and Tensor Product Representations [00:40:50]
AI Agency and Risk Assessment

3.1 Agency and Goal-Directed Behavior in Language Models [00:45:10]

3.2 Defining and Measuring Agency in AI Systems [00:49:50]

3.3 Core Knowledge Systems and Agency Detection [00:54:40]

3.4 Language Models as Agent Models and Simulator Theory [01:03:20]

3.5 AI Safety and Societal Control Mechanisms [01:07:10]

3.6 Evolution of AI Capabilities and Emergent Risks [01:14:20]