We're sunsetting PodQuest on 2025-07-28. Thank you for your support!

How to Systematically Test and Evaluate Your LLMs Apps // Gideon Mendels // #269

2024/10/18

MLOps.community

AI Chapters

Chapters

Shownotes Transcript

No transcript made for this episode yet, you may request it for free.

How to Systematically Test and Evaluate Your LLMs Apps // Gideon Mendels // #269

MLOps.community

What's Gideon's Preferred Coffee?

Key Takeaways from the Episode

A Huge Shout-Out to Comet ML

Support the MLOps Podcast

Understanding Evaluation Metrics in AI

Practical LLM Evaluation

Testing Methodologies for LLMs

Using LLMs as Judges

OPIC Track Function Overview

Tracking User Response Value

Integrating AI Metrics

Experiment Tracking with LLMs

Micro Macro Collaboration in AI

RAG Pipeline Reproducibility Snapshot

Collaborative Experiment Tracking

Feature Flags in CI/CD

Labeling Challenges and Solutions

LLM Output Quality Alerts

Anomaly Detection in Model Outputs

Episode Wrap-Up

Shownotes Transcript

How to Systematically Test and Evaluate Your LLMs Apps // Gideon Mendels // #269 01:01:42 Share

MLOps.community

What's Gideon's Preferred Coffee?

Key Takeaways from the Episode

A Huge Shout-Out to Comet ML

Support the MLOps Podcast

Understanding Evaluation Metrics in AI

Practical LLM Evaluation

Testing Methodologies for LLMs

Using LLMs as Judges

OPIC Track Function Overview

Tracking User Response Value

Integrating AI Metrics

Experiment Tracking with LLMs

Micro Macro Collaboration in AI

RAG Pipeline Reproducibility Snapshot

Collaborative Experiment Tracking

Feature Flags in CI/CD

Labeling Challenges and Solutions

LLM Output Quality Alerts

Anomaly Detection in Model Outputs

Episode Wrap-Up

Shownotes Transcript

How to Systematically Test and Evaluate Your LLMs Apps // Gideon Mendels // #269