We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode Everything Hard About Building AI Agents Today

Everything Hard About Building AI Agents Today

2025/6/13
logo of podcast MLOps.community

MLOps.community

AI Deep Dive AI Chapters Transcript
People
D
Demetrios
S
Shreya
W
Willem
Topics
Demetrios:在生产环境中诊断AI系统故障非常困难,需要根本原因分析。我们专注于构建一个AI Agent,通过分析生产系统和可观察性堆栈来诊断警报,并提炼成根本原因或发现。 Willem:生产环境中缺乏可靠的ground truth,验证也是一个问题。我们学到的是,尽可能将人从循环中移除,建立一个快速的学习循环,将生产中的失败反馈到评估系统中。我尝试尽量减少对Agent步骤的依赖,避免底层错误影响整体结果,即使Agent轨迹错误,有时也能偶然发现有用的信息。 Shreya:构建数据处理流水线面临与构建AI Agent相同的挑战。数据处理是理解LLM以及人与系统如何互动的有趣实验场,因为你无法预知所有数据。在数据处理中,验证非常困难,不仅要验证转换的正确性,还要验证数据是否存在以及是否遗漏。我正在开设关于AI评估的课程,讨论如何构建可信赖的AI应用和评估。用户更倾向于将工作流程视为数据流管道,而不是从自然语言到管道的转换。AI有很长的失败模式,需要将这些模式综合成具体的指令。LLM擅长处理详细的提示,但不擅长处理模糊的提示,需要提供示例和详细说明才能达到理想效果。

Deep Dive

Shownotes Transcript

Willem Pienaar and Shreya Shankar discuss the challenge of evaluating agents in production where "ground truth" is ambiguous and subjective user feedback isn't enough to improve performance.

The discussion breaks down the three "gulfs" of human-AI interaction—Specification, Generalization, and Comprehension—and their impact on agent success.

Willem and Shreya cover the necessity of moving the human "out of the loop" for feedback, creating faster learning cycles through implicit signals rather than direct, manual review.The conversation details practical evaluation techniques, including analyzing task failures with heat maps and the trade-offs of using simulated environments for testing.

Willem and Shreya address the reality of a "performance ceiling" for AI and the importance of categorizing problems your agent can, can learn to, or will likely never be able to solve.

// Bio

Shreya Shankar

PhD student in data management for machine learning.

Willem Pienaar

Willem Pienaar, CTO of Cleric, is a builder with a focus on LLM agents, MLOps, and open source tooling. He is the creator of Feast, an open source feature store, and contributed to the creation of both the feature store and MLOps categories.

Before starting Cleric, Willem led the open source engineering team at Tecton and established the ML platform team at Gojek, where he built high scale ML systems for the Southeast Asian decacorn.

// Related Links

https://www.google.com/about/careers/applications/?utm_campaign=profilepage&utm_medium=profilepage&utm_source=linkedin&src=Online/LinkedIn/linkedin_pagehttps://cleric.ai/






Catch all episodes, blogs, newsletters, and more: https://go.mlops.community/TYExplore

MLOps Swag/Merch: [https://shop.mlops.community/]

Connect with Demetrios on LinkedIn: /dpbrinkm

Connect with Shreya on LinkedIn: /shrshnk

Connect with Willem on LinkedIn: /willempienaar



Timestamps:





[00:00] Trust Issues in AI Data

[04:49] Cloud Clarity Meets Retrieval

[09:37] Why Fast AI Is Hard

[11:10] Fixing AI Communication Gaps

[14:53] Smarter Feedback for Prompts

[19:23] Creativity Through Data Exploration

[23:46] Helping Engineers Solve Faster

[26:03] The Three Gaps in AI

[28:08] Alerts Without the Noise

[33:22] Custom vs General AI

[34:14] Sharpening Agent Skills

[40:01] Catching Repeat Failures

[43:38] Rise of Self-Healing Software

[44:12] The Chaos of Monitoring AI