We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode “The Sweet Lesson: AI Safety Should Scale With Compute” by Jesse Hoogland

“The Sweet Lesson: AI Safety Should Scale With Compute” by Jesse Hoogland

2025/5/5
logo of podcast LessWrong (30+ Karma)

LessWrong (30+ Karma)

Shownotes Transcript

A corollary of Sutton's Bitter Lesson is that solutions to AI safety should scale with compute. Let me list a few examples of research directions that aim at this kind of solution:

  • Deliberative Alignment: Combine chain-of-thought with Constitutional AI, so that safety improves with inference-time compute (see Guan et al. 2025, Figure 13).
  • AI Control: Design control protocols that pit a red team against a blue team so that running the game for longer results in more reliable estimates of the probability of successful scheming during deployment (e.g., weight exfiltration).
  • Debate: Design a debate protocol so that running a longer, deeper debate between AI assistants makes us more confident that we're encouraging truthfulness or other desirable qualities (see Irving et al. 2018, Table 1).
  • Bengio's Scientist AI: Develop safety guardrails that obtain more reliable estimates of the probability of catastrophic risk with increasing inference time:[1]

[I]n the short [...]

The original text contained 2 footnotes which were omitted from this narration.


First published: May 5th, 2025

Source: https://www.lesswrong.com/posts/6hy7tsB2pkpRHqazG/the-sweet-lesson-ai-safety-should-scale-with-compute)

    ---
    

Narrated by TYPE III AUDIO).