We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Auditing LLMs and Twitter

2025/1/29

Data Skeptic

AI Deep Dive AI Chapters Transcript

People

Asaf

Erwan Le Merrer

Gilles Trédan

Topics

Asaf: 本期节目讨论了图论方法在大型语言模型（LLM）幻觉检测和Twitter影子封禁检测中的应用。我们首先探讨了LLM在生成空手道俱乐部图时出现的幻觉现象，并分析了其原因。LLM并非为图而设计，其能够处理图是令人惊奇的。 Erwan Le Merrer: 我们认为LLM处理图的问题更像是一个记忆和统计问题，而不是图论问题。如果LLM能够更多地接触到这个数据集，它可能会更好地重现它。 Gilles Trédan: 我们研究了Twitter上的影子封禁问题，发现Twitter声称不存在影子封禁与数据结果不符。我们使用图论方法检测影子封禁，并发现影子封禁并非随机的，而是集中在某些社区。我们使用流行病模型来模拟影子封禁的传播，该模型能够很好地解释观察到的现象。 Erwan Le Merrer: 我和Gilles的学术合作始于博士期间，我们一直研究图论在分布式系统中的应用。在分布式系统中，节点和边代表了各个参与者及其之间的协作模式。在处理数百万个节点的分布式系统时，可扩展性是一个挑战。我们还将图论应用于分布式系统、算法执行、推荐算法和LLM审核等多个领域。我们要求大型语言模型生成著名的图，例如空手道俱乐部图，来研究其幻觉行为并试图从中了解模型内部结构。我们设计的提示很简单，要求模型以Python边列表的形式输出指定的图。我们通过图编辑距离和度数序列来衡量大型语言模型生成图的幻觉程度。即使大型语言模型生成的图不是完美的复制品，我们也关注其是否保留了原始图的结构特征，例如社区结构。与传统的基于二元问题的LLM评估方法相比，使用图作为提示可以获得更多信息。我们使用图集距离（GAD）来衡量大型语言模型的幻觉程度，该指标与幻觉排行榜上的结果具有良好相关性。图集距离是一种计算密集型且粗略的图距离度量方法。不同大型语言模型在生成图时犯的错误是不同的。我们利用多种方法检测影子封禁，包括搜索禁令、回复禁令以及幽灵禁令等。我们使用流行病模型来模拟影子封禁的局部性。 Gilles Trédan: Twitter将影子封禁归咎于bug，但我们认为这掩盖了更深层次的原因。我们通过分析Twitter用户的邻居中影子封禁用户的比例，发现存在显著差异，这表明影子封禁并非随机的。我们使用流行病模型来模拟影子封禁在Twitter网络中的传播，该模型能够很好地解释观察到的现象。我们利用Twitter用户ID的特性进行随机抽样，以避免引入偏差。

Deep Dive

Chapters

This chapter explores the accuracy of LLMs in recreating the Zachary's Karate Club graph. The researchers tested multiple LLMs, measuring their ability to reproduce the graph's structure and community properties, introducing new metrics to evaluate the quality of the LLM's output and comparing the results to existing hallucination benchmarks.

LLMs were prompted to generate the Karate Club graph as an edge list.
The accuracy of LLM outputs was measured using graph edit distance and a novel metric called Graph Atlas Distance (GAD).
Results showed varying degrees of hallucination across different LLMs, with some showing remarkable accuracy while others generated significantly distorted graphs.
The study highlights the potential of using graph-based approaches to understand the internal workings of LLMs and to evaluate their performance in handling complex data structures.

Shownotes Transcript

Our guests, Erwan Le Merrer and Gilles Tredan, are long-time collaborators in graph theory and distributed systems. They share their expertise on applying graph-based approaches to understanding both large language model (LLM) hallucinations and shadow banning on social media platforms.

In this episode, listeners will learn how graph structures and metrics can reveal patterns in algorithmic behavior and platform moderation practices.

Key insights include the use of graph theory to evaluate LLM outputs, uncovering patterns in hallucinated graphs that might hint at the underlying structure and training data of the models, and applying epidemic models to analyze the uneven spread of shadow banning on Twitter.

Want to listen ad-free? Try our Graphs Course? Join Data Skeptic+ for $5 / month of $50 / year

https://plus.dataskeptic.com)

Auditing LLMs and Twitter 40:26 Share

Data Skeptic

Deep Dive

Shownotes Transcript

Auditing LLMs and Twitter