We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
back
“Reward hacking is becoming more sophisticated and deliberate in frontier LLMs” by Kei
26:15
Share
2025/4/25
LessWrong (30+ Karma)
AI Chapters
Transcribe
Chapters
Recent Examples of Reward Hacking
Cheating to Win at Chess?
Faking LLM Fine-Tuning?
Why Are We Seeing This Now?
Behavioral Changes Due to Increased RL Training
Why More Researchers Should Focus on Reward Hacking?
The Importance of Solving Reward Hacking for AI Alignment
Frontier Companies and Robust Solutions
Reasons Against Working on Reward Hacking
Interesting Research Directions
Evaluating Current Reward Hacking
Mitigations and Future Steps
Acknowledgements
Appendix: More Reward Hacks
Shownotes
Transcript
No transcript made for this episode yet, you may request it for free.