We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Sakana AI - Chris Lu, Robert Tjarko Lange, Cong Lu

2025/3/1

Machine Learning Street Talk (MLST)

AI Deep Dive AI Chapters Transcript

People

Chris Lu

Cong Lu

Robert Tjarko Lange

Topics

Chris Lu: 我专注于元学习和多智能体系统，我的研究表明语言模型可以发现和设计更好的训练算法，例如在DiscoPOP论文中，我们展示了如何利用语言模型发现和设计更好的优化算法，以提高样本效率和速度，并更好地使语言模型与人类偏好保持一致。我们不再需要完全依赖人工设计算法，而是可以利用语言模型进行更广泛的搜索，找到更优的算法。我参与了DiscoPOP项目，该项目旨在利用大型语言模型发现用于训练语言模型的优化算法。我们发现，语言模型能够发现具有非凸性质的损失函数，这对于处理噪声数据可能非常有效。此外，我们还探索了如何将公平性和其他更抽象的概念纳入优化目标中，以提高模型的鲁棒性和可解释性。未来，我希望能够将人类的监督和AI的自动化能力结合起来，共同推进算法的发现和优化。人类可以负责设定优先级和选择策略，而AI则可以进行大量的试错和探索。 Robert Tjarko Lange: 我专注于进化算法和大型语言模型，我的研究方向是进化计算和基础模型的交叉领域，我致力于利用大型语言模型作为进化策略进行优化。在EvoLLM项目中，我们探索了利用大型语言模型作为进化策略进行黑盒优化的可能性。我们发现，通过适当的提示策略和上下文信息，语言模型能够有效地识别和利用之前的评估结果，从而进行更智能的探索和利用。此外，我们还发现，较小的模型在某些情况下可能比大型模型表现更好，这可能是由于大型模型的混合专家架构所致。在元学习方面，我们探索了如何利用教师算法轨迹进行微调，以提高模型的性能。此外，我们还研究了标记化偏差对模型性能的影响，并尝试使用整数离散化来解决这个问题。我认为，大型语言模型可以作为一种通用的表示形式，用于处理多种模态的数据。通过适当的抽象表示，语言模型能够识别模式并进行有效的优化。未来，我希望能够进一步探索如何利用进化算法和大型语言模型来提高AI系统的创造力和泛化能力。 Cong Lu: 我主要研究开放式学习，我参与了AI科学家和智能Go-Explore项目。在AI科学家项目中，我们尝试利用大型语言模型实现科学发现的自动化，从想法生成到代码编写、实验执行以及论文撰写，整个过程都由AI自动完成。虽然生成的论文可能存在一些问题，但我们证明了利用AI进行科学发现的可能性。在智能Go-Explore项目中，我们利用大型语言模型来识别环境中的有趣状态，从而提高强化学习中的探索效率。我们发现，大型语言模型能够很好地捕捉人类对“有趣”的直觉，并将其应用于探索过程中。未来，我希望能够进一步探索如何将大型语言模型应用于更开放和更具挑战性的科学问题中，并研究如何解决AI系统中的偏差问题。我还对构建完全由AI生成的会议和论文评审系统充满兴趣，这将有助于推动科学发现的效率和创新。

Deep Dive

Shownotes Transcript

We speak with Sakana AI, who are building nature-inspired methods that could fundamentally transform how we develop AI systems.

The guests include Chris Lu, a researcher who recently completed his DPhil at Oxford University under Prof. Jakob Foerster's supervision, where he focused on meta-learning and multi-agent systems. Chris is the first author of the DiscoPOP paper, which demonstrates how language models can discover and design better training algorithms. Also joining is Robert Tjarko Lange, a founding member of Sakana AI who specializes in evolutionary algorithms and large language models. Robert leads research at the intersection of evolutionary computation and foundation models, and is completing his PhD at TU Berlin on evolutionary meta-learning. The discussion also features Cong Lu, currently a Research Scientist at Google DeepMind's Open-Endedness team, who previously helped develop The AI Scientist and Intelligent Go-Explore.

SPONSOR MESSAGES:

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!

https://centml.ai/pricing/

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

Goto https://tufalabs.ai/

DiscoPOP - A framework where language models discover their own optimization algorithms
EvoLLM - Using language models as evolution strategies for optimization

The AI Scientist - A fully automated system that conducts scientific research end-to-end

Neural Attention Memory Models (NAMMs) - Evolved memory systems that make transformers both faster and more accurate

TRANSCRIPT + REFS:

https://www.dropbox.com/scl/fi/gflcyvnujp8cl7zlv3v9d/Sakana.pdf?rlkey=woaoo82943170jd4yyi2he71c&dl=0

Robert Tjarko Lange

https://roberttlange.com/

Chris Lu

https://chrislu.page/

Cong Lu

https://www.conglu.co.uk/

Sakana

https://sakana.ai/blog/

TOC:

LLMs for Algorithm Generation and Optimization

[00:00:00] 1.1 LLMs generating algorithms for training other LLMs

[00:04:00] 1.2 Evolutionary black-box optim using neural network loss parameterization

[00:11:50] 1.3 DiscoPOP: Non-convex loss function for noisy data

[00:20:45] 1.4 External entropy Injection for preventing Model collapse

[00:26:25] 1.5 LLMs for black-box optimization using abstract numerical sequences

Model Learning and Generalization

[00:31:05] 2.1 Fine-tuning on teacher algorithm trajectories

[00:31:30] 2.2 Transformers learning gradient descent

[00:33:00] 2.3 LLM tokenization biases towards specific numbers

[00:34:50] 2.4 LLMs as evolution strategies for black box optimization

[00:38:05] 2.5 DiscoPOP: LLMs discovering novel optimization algorithms
AI Agents and System Architectures

[00:51:30] 3.1 ARC challenge: Induction vs. transformer approaches

[00:54:35] 3.2 LangChain / modular agent components

[00:57:50] 3.3 Debate improves LLM truthfulness

[01:00:55] 3.4 Time limits controlling AI agent systems

[01:03:00] 3.5 Gemini: Million-token context enables flatter hierarchies

[01:04:05] 3.6 Agents follow own interest gradients

[01:09:50] 3.7 Go-Explore algorithm: archive-based exploration

[01:11:05] 3.8 Foundation models for interesting state discovery

[01:13:00] 3.9 LLMs leverage prior game knowledge
AI for Scientific Discovery and Human Alignment

[01:17:45] 4.1 Encoding Alignment & Aesthetics via Reward Functions

[01:20:00] 4.2 AI Scientist: Automated Open-Ended Scientific Discovery

[01:24:15] 4.3 DiscoPOP: LLM for Preference Optimization Algorithms

[01:28:30] 4.4 Balancing AI Knowledge with Human Understanding

[01:33:55] 4.5 AI-Driven Conferences and Paper Review

Sakana AI - Chris Lu, Robert Tjarko Lange, Cong Lu 01:37:54 Share

Machine Learning Street Talk (MLST)

Deep Dive

Shownotes Transcript

Sakana AI - Chris Lu, Robert Tjarko Lange, Cong Lu