We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
back
“Recommendations for Technical AI Safety Research Directions” by Sam Marks
37:00
Share
2025/1/10
LessWrong (30+ Karma)
AI Chapters
Transcribe
Chapters
How do we evaluate AI capabilities?
What methods can we use to evaluate AI alignment?
How can we understand a model’s cognition?
How does a model’s persona influence its behavior and generalization?
What is chain-of-thought faithfulness and why is it important?
How can we ensure AI control?
What is behavioral monitoring and why is it crucial?
How does activation monitoring help in AI safety?
What is anomaly detection and how does it work?
How can we achieve scalable oversight in AI systems?
What is recursive oversight and why is it important?
How can we improve generalization from weak to strong and easy to hard?
What is honesty in the context of AI safety?
How can we ensure adversarial robustness in AI systems?
What are realistic and differential benchmarks for jailbreaks?
How can we implement adaptive defenses in AI?
What are some miscellaneous topics in AI safety research?
Shownotes
Transcript
No transcript made for this episode yet, you may request it for free.