We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Robustness, Detectability, and Data Privacy in AI // Vinu Sankar Sadasivan // #289

2025/2/7

MLOps.community

AI Deep Dive AI Chapters Transcript

People

Vinu Sankar Sadasivan

Topics

Vinu Sankar Sadasivan: 我认为目前声称能够检测AI生成文本的工具并不可靠,水印技术只是文本检测方法之一,但并非唯一。过去文本水印技术是在文本中加入拼写错误或空格模式,但现在AI水印技术已经发展。水印技术在有攻击者的情况下效果会打折扣,我的论文研究了四种检测器,水印是其中之一。随着语言模型越来越大,检测变得越来越困难,因为它们可以轻松模仿人类的写作风格。水印只是我们论文中分析的工具之一,如果攻击者真的想攻击,很容易被打破。我们的理论表明,目前不存在万无一失的技术,水印技术可以作为一层安全保障,但很容易被移除。仅仅通过提示很难使AI生成的文本看起来更像人类,因为模型已经经过微调,可以更好地嵌入AI签名。我在论文中使用释义器,因为这与我们的理论一致,并且也为了攻击水印技术。对于水印技术,不能说集合B中的所有段落都是水印文本,因为这会导致人类写作的文本也被检测为水印文本。使用这些检测系统需要在第一类错误和第二类错误之间进行权衡。我认为AI巨头们最近采取了一些措施,使模型更容易被检测到。

Deep Dive

Chapters

AI text detection is not foolproof. Watermarking is one technique, but it's not effective against determined attackers. The paper explores various detection methods and demonstrates their vulnerabilities.

AI text detection methods are not completely reliable.
Watermarking is a technique that can be broken by attackers.
There is a fundamental trade-off between detecting AI-generated text and avoiding false positives.

Shownotes Transcript

Vinu Sankar Sadasivan)* *is a CS PhD ... Currently, I am working as a full-time *Student Researcher at *Google DeepMind) on jailbreaking multimodal AI models.

Robustness, Detectability, and Data Privacy in AI // MLOps Podcast #289 with Vinu Sankar Sadasivan, Student Researcher at Google DeepMind.

// Abstract Recent rapid advancements in Artificial Intelligence (AI) have made it widely applicable across various domains, from autonomous systems to multimodal content generation. However, these models remain susceptible to significant security and safety vulnerabilities. Such weaknesses can enable attackers to jailbreak systems, allowing them to perform harmful tasks or leak sensitive information. As AI becomes increasingly integrated into critical applications like autonomous robotics and healthcare, the importance of ensuring AI safety is growing. Understanding the vulnerabilities in today’s AI systems is crucial to addressing these concerns.

// Bio Vinu Sankar Sadasivan is a final-year Computer Science PhD candidate at The University of Maryland, College Park, advised by Prof. Soheil Feizi. His research focuses on Security and Privacy in AI, with a particular emphasis on AI robustness, detectability, and user privacy. Currently, Vinu is a full-time Student Researcher at Google DeepMind, working on jailbreaking multimodal AI models. Previously, Vinu was a Research Scientist intern at Meta FAIR in Paris, where he worked on AI watermarking.

Vinu is a recipient of the 2023 Kulkarni Fellowship and has earned several distinctions, including the prestigious Director’s Silver Medal. He completed a Bachelor’s degree in Computer Science & Engineering at IIT Gandhinagar in 2020. Prior to their PhD, Vinu gained research experience as a Junior Research Fellow in the Data Science Lab at IIT Gandhinagar and through internships at Caltech, Microsoft Research India, and IISc.

// MLOps Swag/Merch https://shop.mlops.community/

// Related Links Website: https://vinusankars.github.io/

--------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/

Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Richard on LinkedIn: https://www.linkedin.com/in/vinusankars/

Robustness, Detectability, and Data Privacy in AI // Vinu Sankar Sadasivan // #289 52:59 Share

MLOps.community

Deep Dive

Shownotes Transcript

Robustness, Detectability, and Data Privacy in AI // Vinu Sankar Sadasivan // #289