We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

Tricks to Fine Tuning // Prithviraj Ammanabrolu // #318

2025/5/26

MLOps.community

AI Deep Dive AI Chapters Transcript

People

Raj

Topics

Raj：Tao 是一种无需标签即可为特定领域微调模型的方法。它通过强化学习和合成数据，让模型能够评估和改进自身。模型生成响应后，奖励模型会对其进行评分，然后通过强化学习调整模型权重，使模型更有可能产生高分输出。这种方法避免了对大量人工标注数据的依赖，并允许模型从自身的错误中学习，从而更有效地适应特定任务。 Raj：测试时自适应优化（TAO）的关键在于，它在训练时消耗额外的推理时间计算，从而在实际部署时保持相同的推理延迟。客户提供任务提示后，系统会在训练时进行推理过程，并动态调整阈值。此外，生成多样化的响应至关重要，避免冗余计算和信息重复。通过迭代训练和重新部署，并不断引入新的信号，可以避免奖励模型过度拟合，并保持模型的有效性。 Demetrios：我理解了，Tao 的价值在于它避免了对标签数据的依赖，并能够根据用户提供的提示生成更好的模型。这种方法通过使用奖励模型来评估和改进模型生成的响应，从而实现高效的微调。此外，通过在训练时消耗额外的推理时间计算，Tao 能够在实际部署时保持相同的推理延迟，从而提供更好的用户体验。

Deep Dive

Chapters

Tao is a method for fine-tuning models without labeled data, using reinforcement learning and synthetic data. It addresses the challenge of high annotation costs and allows for customized models without labels.

Tao fine-tunes models without labeled data
Uses reinforcement learning and synthetic data
Addresses high annotation costs
Enables customized models for specific domains

Shownotes Transcript

Tricks to Fine Tuning // MLOps Podcast #318 with Prithviraj Ammanabrolu, Research Scientist at Databricks.

Join the Community: https://go.mlops.community/YTJoinIn

Get the newsletter: https://go.mlops.community/YTNewsletter

// Abstract

Prithviraj Ammanabrolu drops by to break down Tao fine-tuning—a clever way to train models without labeled data. Using reinforcement learning and synthetic data, Tao teaches models to evaluate and improve themselves. Raj explains how this works, where it shines (think small models punching above their weight), and why it could be a game-changer for efficient deployment.

// Bio

Raj is an Assistant Professor of Computer Science at the University of California, San Diego, leading the PEARLS Lab in the Department of Computer Science and Engineering (CSE). He is also a Research Scientist at Mosaic AI, Databricks, where his team is actively recruiting research scientists and engineers with expertise in reinforcement learning and distributed systems.

Previously, he was part of the Mosaic team at the Allen Institute for AI. He earned his PhD in Computer Science from the School of Interactive Computing at Georgia Tech, advised by Professor Mark Riedl in the Entertainment Intelligence Lab.

// Related Links

Website: https://www.databricks.com/


Catch all episodes, blogs, newsletters, and more: https://go.mlops.community/TYExplore

Join our Slack community [https://go.mlops.community/slack]

Follow us on X/Twitter [@mlopscommunity](https://x.com/mlopscommunity) or [LinkedIn](https://go.mlops.community/linkedin)] 

Sign up for the next meetup: [https://go.mlops.community/register]

MLOps Swag/Merch: [https://shop.mlops.community/]

Connect with Demetrios on LinkedIn: /dpbrinkm

Connect with Raj on LinkedIn: /rajammanabrolu



Timestamps:

[00:00] Raj&#39;s preferred coffee

[00:36] Takeaways

[01:02] Tao Naming Decision

[04:19] No Labels Machine Learning

[08:09] Tao and TAO breakdown

[13:20] Reward Model Fine-Tuning

[18:15] Training vs Inference Compute

[22:32] Retraining and Model Drift

[29:06] Prompt Tuning vs Fine-Tuning

[34:32] Small Model Optimization Strategies

[37:10] Small Model Potential

[43:08] Fine-tuning Model Differences

[46:02] Mistral Model Freedom

[53:46] Wrap up

Tricks to Fine Tuning // Prithviraj Ammanabrolu // #318 55:33 Share

MLOps.community

Deep Dive

Shownotes Transcript

Tricks to Fine Tuning // Prithviraj Ammanabrolu // #318