We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode Ep 58: Google Researchers Noam Shazeer and Jack Rae on Scaling Test-time Compute, Reactions to Ilya & AGI

Ep 58: Google Researchers Noam Shazeer and Jack Rae on Scaling Test-time Compute, Reactions to Ilya & AGI

2025/3/17
logo of podcast Unsupervised Learning

Unsupervised Learning

AI Deep Dive AI Chapters Transcript
People
J
Jack Rae
N
Noam Shazeer
Topics
Jack Rae: 我们最初专注于推理任务(数学和代码),但令人惊讶的是,在训练模型匹配Gemini Flash风格后,发现“思考”机制也提升了创造性任务的表现,例如撰写文章。这表明测试时间计算的应用范围可能比我们预想的更广。 我们正在探索如何改进评估标准,因为以前认为很难的任务,现在模型很容易解决。这说明评估标准需要不断改进和更新,以更好地衡量模型的实际能力。 我认为AI最重要的里程碑是AI能够辅助自身研发,形成正反馈循环,这包括AI辅助代码编写、数据飞轮效应以及整体的AI研发热潮。 对于难以验证的领域,模型的应用需要改进验证方法或增加更多的人工反馈循环。模型能够理解更抽象的指令,并通过训练模型对定性工作进行奖励信号反馈,从而解决难以验证领域的问题。 Gemini app中集成了更强大的模型,并与各种工具集成,用户体验更好,即使增加了一些延迟,用户也愿意为高质量的答案付出代价。Gemini模型在图像输入方面表现出色,尤其是在结合“思考”机制后,图像输入加思考的组合效果非常好。 要使智能体更广泛地应用,需要解决推理的复杂性和可靠性问题,这需要找到通用的解决方案,因为用户的应用方式是无法预测的。测试时间计算方法并非通往AGI的唯一途径,还需要在智能体与复杂环境交互方面进行更多研究。我们应该努力训练模型进行深度思考,从而提高数据效率,就像一位数学家能够从一本数学教材中学习到大量知识一样。 数学领域将从基准测试转向解决实际问题,这将成为衡量AI进步的重要指标。 我认为开源模型能够紧跟前沿模型的步伐,这令我印象深刻。过去一年中,AI的进步速度比预期快得多,这改变了我对信息传播和技术采用的认知。 与大规模预训练相比,测试时间计算模型的硬件基础设施需求可能更加灵活和分布式。 我们正在努力确保AI安全且有益于人类,并关注AGI对经济和就业的影响,以及如何安全地发布更强大的模型。在智能体领域,构建合适的应用体验对于模型发挥作用至关重要。 Noam Shazeer: 尽管一开始对专注于数学等特定领域持怀疑态度,但好的基准测试对于区分更难的问题至关重要,避免模型仅仅通过增加参数和记忆来降低困惑度。 我认为测试时间计算方法并不能让我们完全达到AGI,因为大型语言模型的推理成本非常低廉,还有很多计算资源可以利用。 一些科学发现是通过整合已知信息来实现的,这反驳了当前模型无法进行原创性思考的观点。 AI研究的文化应该鼓励分享、开放合作,并对贡献进行宽松的认定。自下而上的计算资源分配方式有利于创新,因为它允许新项目获得更多资源,而自上而下的方式则有利于大型项目的合作。 对大型语言模型的投资是一个正确的决定,尽管当时并非所有人都认同。大型语言模型被低估了,其潜力远不止于一些大型商业应用。代码领域被低估了,因为它能够自我加速AI的发展。 AI正在改变教育方式,孩子们能够通过AI获取个性化的信息,这将提高下一代人的知识水平。 我认为ARC AGI评估标准被高估了,因为它可能无法推动AGI的真正发展。

Deep Dive

Chapters
This chapter explores Gemini 2.0's capabilities, focusing on its performance in reasoning tasks like math and code. The researchers discuss unexpected improvements in creative tasks and the importance of robust benchmarks.
  • Initial focus on reasoning tasks (math, code) surprisingly generalized to creative tasks.
  • Skepticism towards math focus initially, but it proved crucial for distinguishing genuine problem-solving from memorization.
  • Need for better, more meaningful evals that go beyond current benchmarks.

Shownotes Transcript

On the latest episode of Unsupervised Learning, Jacob is joined by two of the most influential minds in AI today. 

🔹 Noam Shazeer, co-inventor of the Transformer

🔹 Jack Rae, Research Director at DeepMind and one of the leads behind Gemini’s Flash Thinking

We got to ask them all of the top-of-mind questions in AI today about where we are, where we’re headed and what it means for businesses and the world. Some key take-aways:

 

[0:00] Intro

[1:30] Exploring Gemini 2.0

[4:04] Challenges with Evals and Benchmarks

[6:14] Reinforcement Loops and AI Productivity

[8:15] Agentic Coding and AI in Development

[13:02] Multimodal Capabilities and Applications

[15:21] Future of AI: Complexity and Reliability

[19:02] Test Time Compute and Data Efficiency

[31:20] AI Research Culture and Breakthroughs

[38:59] Reflecting on Large Language Models

[39:37] The Rise of Vision-Based Models

[41:01] Native Image Generation and General Purpose Models

[41:35] AI in Healthcare and Specialized Models

[43:32] Shifting Timelines and Rapid Adoption

[46:48] Open Source Models and Competitive Edge

[49:30] AI's Impact on Education and Personal Lives

[55:10] AGI Risks and Safety Considerations

[57:33] Future of AI Companions

[1:02:17] Quickfire

 

With your co-hosts: 

@jacobeffron 

  • Partner at Redpoint, Former PM Flatiron Health

 

@patrickachase 

  • Partner at Redpoint, Former ML Engineer LinkedIn

 

@ericabrescia 

  • Former COO Github, Founder Bitnami (acq’d by VMWare)

 

@jordan_segall 

  • Partner at Redpoint