We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode Cleanlab: Labeled Datasets that Correct Themselves Automatically // Curtis Northcutt // MLOps Coffee Sessions #105

Cleanlab: Labeled Datasets that Correct Themselves Automatically // Curtis Northcutt // MLOps Coffee Sessions #105

2022/7/1
logo of podcast MLOps.community

MLOps.community

Shownotes Transcript

MLOps Coffee Sessions #106 with Curtis Northcutt, CEO & Co-Founder of Cleanlab, Cleanlab: Labeled Datasets that Correct Themselves Automatically co-hosted by Vishnu Rachakonda.

// Abstract Pioneered at MIT by 3 Ph.D. Co-Founders, Cleanlab is an open-source/SaaS company building the premier data-centric AI tools workflows for (1) automatically correcting messy data and labels, (2) auto-tracking of dataset quality over time, (3) automatically finding classes to merge and delete, (4) auto ml for data tasks, (5) obtaining and ranking high-quality annotations, and (6) training ML models with messy data.

Most of the prescriptive tasks (finding issues) can be done in one line of code with their open-source product: https://github.com/cleanlab/cleanlab).

// Bio Curtis Northcutt is the CEO and Co-Founder of Cleanlab focused on making AI work reliably for people and their messy, real-world data by automatically fixing issues in any ML dataset. Curtis completed his Ph.D. in Computer Science at MIT, receiving the MIT Thesis Award, NSF Fellowship, and the Goldwater Scholarship. Prior to Cleanlab, Curtis worked at AI research groups including Google, Oculus, Amazon, Facebook, Microsoft, and NASA.

// MLOps Jobs board   https://mlops.pallet.xyz/jobs)

MLOps Swag/Merch https://mlops-community.myshopify.com/)

// Related Links https://github.com/cleanlab/cleanlab) https://cleanlab.ai/blog/cleanlab-history/) https://labelerrors.com/ https://l7.curtisnorthcutt.com/) https://nips.cc/Conferences/2021/ScheduleMultitrack?event=47102) https://www.youtube.com/watch?v=ieUOv1sQPlw) https://cleanlab.typeform.com/to/NLnU1XZF) Cameo cheating detection system: https://arxiv.org/ftp/arxiv/papers/1508/1508.05699.pdf)   The Cathedral & the Bazaar book: https://www.amazon.com/Cathedral-Bazaar-Musings-Accidental-Revolutionary/dp/0596001088)

--------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack) Follow us on Twitter: @mlopscommunity) Sign up for the next meetup: https://go.mlops.community/register) Catch all episodes, blogs, newsletters, and more: https://mlops.community/)

Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/) Connect with Vishnu on LinkedIn: https://www.linkedin.com/in/vrachakonda/) Connect with Curtis on LinkedIn: https://www.linkedin.com/in/cgnorthcutt/)

Timestamps: [00:00] Introduction to Curtis Northcutt [00:30] Difference between MLOps and Data-Centric AI [04:04] Realizing the problem of data quality in ML manifesting [05:11] Computer vision problems [06:54] War story that got Curtis into Data-Centric AI [13:50] Overview of Curtis' vision [14:45] PU Learning [21:25] Consistency Rate and Flipping Rate [25:25] One line of code [29:48] Models makes mistakes   [33:09] Cleanlab play with the environment [36:30] How ML Engineers should approach data quality problem [42:42] Quantum computing [46:39] Result of confident learning [52:31] Utility for small data sets [53:53] Cleanlab's huge success stories [56:13] Rapid fire questions [58:58] Cloudy and mystified space [1:03:46] Cleanlab is hiring! [1:05:06] Wrap up