We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode “FLAKE-Bench: Outsourcing Awkwardness in the Age of AI” by annas, Twm Stone

“FLAKE-Bench: Outsourcing Awkwardness in the Age of AI” by annas, Twm Stone

2025/4/1
logo of podcast LessWrong (30+ Karma)

LessWrong (30+ Karma)

AI Chapters
Chapters

Shownotes Transcript

** Introduction**

A key part of modern social dynamics is flaking at short notice. However, anxiety in coming up with believable and socially acceptable reasons to do so can instead lead to ‘ghosting’, awkwardness, or implausible excuses, risking emotional harm and resentment in the other party. The ability to delegate this task to a Large Language Model (LLM) could substantially reduce friction and enhance the flexibility of user's social life while greatly minimising the aforementioned creative burden and moral qualms.

We introduce FLAKE-Bench, an evaluation of models’ capacity to effectively, kindly, and humanely extract themselves from a diverse set of social, professional and romantic scenarios. We report the efficacy of 10 frontier or recently-frontier LLMs in bailing on prior commitments, because nothing says “I value our friendship” like having AI generate your cancellation texts. We open-source FLAKE-Bench on GitHub to support future research, and the full paper is available [...]


Outline:

(01:33) Methodology

(02:15) Key Results

(03:07) The Grandmother Mortality Singularity

(03:35) Conclusions


First published: April 1st, 2025

Source: https://www.lesswrong.com/posts/niJCS6sSAF2i4sDCY/flake-bench-outsourcing-awkwardness-in-the-age-of-ai)

    ---
    

Narrated by TYPE III AUDIO).


Images from the article: Prior state-of the-art methods for excusing oneself (Munroe, 2025), which demonstrate the potential for LLMs to significantly advance the field.) Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts), or another podcast app.