We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode Yoshua Bengio - Designing out Agency for Safe AI

Yoshua Bengio - Designing out Agency for Safe AI

2025/1/15
logo of podcast Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

AI Deep Dive AI Chapters Transcript
People
Y
Yoshua Bengio
Topics
Yoshua Bengio: 我对AI安全非常担忧,特别是那些具有能动性的AI。我认为,许多导致灾难性后果的场景都源于AI的能动性,因为我们无法完美控制AI的目标。奖励篡改就是一个例子,AI可能会为了最大化奖励而操纵其自身程序,甚至控制人类以防止被关闭。因此,我认为构建不具有能动性的AI至关重要。 我不认为我们需要通过解决能动性问题来构建AGI。我们可以构建非常有用的非能动性机器,从而大大降低风险,同时仍然获得许多好处,并且不会完全关闭能动性的大门。我们可以构建像科学家一样的机器,专注于理解世界,而不是迎合我们的需求。我们可以利用这些非能动性AI来推进科学、医学和气候变化等领域的研究,而不会冒失去控制的风险。 我认为,工具性目标是几乎任何其他目标的副产品,例如自我保护和寻求知识。这些目标可能导致AI追求权力,最终失去人类的控制。因此,我们需要理解并区分知识和目标之间的正交性,这样我们就可以构建既智能又具有良好目标的AI。 当前的AI对齐工作不足,我们需要更多元的研究方向,包括评估、缓解和重新设计AI构建方式。我们需要透明度,迫使公司公开其风险评估和缓解计划,以避免诉讼并保护公众利益。我们需要国际合作,因为单一国家或公司拥有过多的权力是危险的。我们需要多边努力,并开发验证技术来确保各国不会秘密地将AGI用于有害目的。 我不确定我们距离AGI还有多远,但我们需要为各种情况做好准备,包括最坏的情况。我们需要在保持领先地位的同时,确保AI安全,这需要来自多个民主国家的资源和人才。我们需要一个类似于CERN的公共、非营利性组织来进行AGI研究,并以安全为首要原则。

Deep Dive

Chapters
Yoshua Bengio discusses potential catastrophic outcomes from powerful AI, focusing on scenarios of human misuse and loss of control due to malicious AI goals. He emphasizes the need to understand these risks to mitigate them.
  • Catastrophic outcomes from AI misuse and loss of control are discussed.
  • The focus is on understanding risks to enable mitigation.

Shownotes Transcript

Professor Yoshua Bengio is a pioneer in deep learning and Turing Award winner. Bengio talks about AI safety, why goal-seeking “agentic” AIs might be dangerous, and his vision for building powerful AI tools without giving them agency. Topics include reward tampering risks, instrumental convergence, global AI governance, and how non-agent AIs could revolutionize science and medicine while reducing existential threats. Perfect for anyone curious about advanced AI risks and how to manage them responsibly.

SPONSOR MESSAGES:


CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.

https://centml.ai/pricing/

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events?

They are hosting an event in Zurich on January 9th with the ARChitects, join if you can.

Goto https://tufalabs.ai/


Interviewer: Tim Scarfe

Yoshua Bengio:

https://x.com/Yoshua_Bengio

https://scholar.google.com/citations?user=kukA0LcAAAAJ&hl=en

https://yoshuabengio.org/

https://en.wikipedia.org/wiki/Yoshua_Bengio

TOC:

  1. AI Safety Fundamentals

[00:00:00] 1.1 AI Safety Risks and International Cooperation

[00:03:20] 1.2 Fundamental Principles vs Scaling in AI Development

[00:11:25] 1.3 System 1/2 Thinking and AI Reasoning Capabilities

[00:15:15] 1.4 Reward Tampering and AI Agency Risks

[00:25:17] 1.5 Alignment Challenges and Instrumental Convergence

  1. AI Architecture and Safety Design

[00:33:10] 2.1 Instrumental Goals and AI Safety Fundamentals

[00:35:02] 2.2 Separating Intelligence from Goals in AI Systems

[00:40:40] 2.3 Non-Agent AI as Scientific Tools

[00:44:25] 2.4 Oracle AI Systems and Mathematical Safety Frameworks

  1. Global Governance and Security

[00:49:50] 3.1 International AI Competition and Hardware Governance

[00:51:58] 3.2 Military and Security Implications of AI Development

[00:56:07] 3.3 Personal Evolution of AI Safety Perspectives

[01:00:25] 3.4 AI Development Scaling and Global Governance Challenges

[01:12:10] 3.5 AI Regulation and Corporate Oversight

  1. Technical Innovations

[01:23:00] 4.1 Evolution of Neural Architectures: From RNNs to Transformers

[01:26:02] 4.2 GFlowNets and Symbolic Computation

[01:30:47] 4.3 Neural Dynamics and Consciousness

[01:34:38] 4.4 AI Creativity and Scientific Discovery

SHOWNOTES (Transcript, references, best clips etc):

https://www.dropbox.com/scl/fi/ajucigli8n90fbxv9h94x/BENGIO_SHOW.pdf?rlkey=38hi2m19sylnr8orb76b85wkw&dl=0

CORE REFS (full list in shownotes and pinned comment):

[00:00:15] Bengio et al.: "AI Risk" Statement

https://www.safe.ai/work/statement-on-ai-risk

[00:23:10] Bengio on reward tampering & AI safety (Harvard Data Science Review)

https://hdsr.mitpress.mit.edu/pub/w974bwb0

[00:40:45] Munk Debate on AI existential risk, featuring Bengio

https://munkdebates.com/debates/artificial-intelligence

[00:44:30] "Can a Bayesian Oracle Prevent Harm from an Agent?" (Bengio et al.) on oracle-to-agent safety

https://arxiv.org/abs/2408.05284

[00:51:20] Bengio (2024) memo on hardware-based AI governance verification

https://yoshuabengio.org/wp-content/uploads/2024/08/FlexHEG-Memo_August-2024.pdf

[01:12:55] Bengio’s involvement in EU AI Act code of practice

https://digital-strategy.ec.europa.eu/en/news/meet-chairs-leading-development-first-general-purpose-ai-code-practice

[01:27:05] Complexity-based compositionality theory (Elmoznino, Jiralerspong, Bengio, Lajoie)

https://arxiv.org/abs/2410.14817

[01:29:00] GFlowNet Foundations (Bengio et al.) for probabilistic inference

https://arxiv.org/pdf/2111.09266

[01:32:10] Discrete attractor states in neural systems (Nam, Elmoznino, Bengio, Lajoie)

https://arxiv.org/pdf/2302.06403