We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

“How will we update about scheming?” by ryan_greenblatt

2025/1/6

LessWrong (30+ Karma)

AI Chapters

Chapters

Shownotes Transcript

No transcript made for this episode yet, you may request it for free.

“How will we update about scheming?” by ryan_greenblatt

LessWrong (30+ Karma)

What are the main takeaways from the research on AI scheming?

How likely is it that we will get clear evidence of AI scheming?

What if we don't get clear evidence of AI scheming before powerful AI?

How will future model architecture and training methods affect the likelihood of scheming?

What properties of AI systems and training processes are relevant to scheming?

What is opaque goal-directed reasoning and how does it relate to AI scheming?

Where do AI capabilities come from, and how do they influence scheming?

What is the overall distribution of scheming likelihood based on AI system properties?

What direct observations can we make about AI scheming?

How can we use model organisms to understand AI scheming?

How can we catch and mitigate problematic AI behavior?

What are the implications of training processes with varying situational awareness?

How does training AIs to seem highly corrigible and myopic impact scheming?

What is the likelihood of scheming under various scenarios, excluding mitigations?

What are the optimistic and pessimistic scenarios for AI properties?

What is the conclusion of the discussion on AI scheming?

What are the caveats and definitions to consider?

How do intelligent learning algorithms contribute to AI capabilities?

Shownotes Transcript

“How will we update about scheming?” by ryan_greenblatt 01:18:49 Share

LessWrong (30+ Karma)

What are the main takeaways from the research on AI scheming?

How likely is it that we will get clear evidence of AI scheming?

What if we don't get clear evidence of AI scheming before powerful AI?

How will future model architecture and training methods affect the likelihood of scheming?

What properties of AI systems and training processes are relevant to scheming?

What is opaque goal-directed reasoning and how does it relate to AI scheming?

Where do AI capabilities come from, and how do they influence scheming?

What is the overall distribution of scheming likelihood based on AI system properties?

What direct observations can we make about AI scheming?

How can we use model organisms to understand AI scheming?

How can we catch and mitigate problematic AI behavior?

What are the implications of training processes with varying situational awareness?

How does training AIs to seem highly corrigible and myopic impact scheming?

What is the likelihood of scheming under various scenarios, excluding mitigations?

What are the optimistic and pessimistic scenarios for AI properties?

What is the conclusion of the discussion on AI scheming?

What are the caveats and definitions to consider?

How do intelligent learning algorithms contribute to AI capabilities?

Shownotes Transcript

“How will we update about scheming?” by ryan_greenblatt