We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode 33: Katharine Jarmul - Testing in Data Science

33: Katharine Jarmul - Testing in Data Science

2017/11/30
logo of podcast Test & Code

Test & Code

Shownotes Transcript

A discussion with Katharine Jarmul, aka kjam, about some of the challenges of data science with respect to testing.

Some of the topics we discuss:

  • experimentation vs testing

  • testing pipelines and pipeline changes

  • automating data validation

  • property based testing

  • schema validation and detecting schema changes

  • using unit test techniques to test data pipeline stages

  • testing nodes and transitions in DAGs

  • testing expected and unexpected data

  • missing data and non-signals

  • corrupting a dataset with noise

  • fuzz testing for both data pipelines and web APIs

  • datafuzz

  • hypothesis

  • testing internal interfaces

  • documenting and sharing domain expertise to build good reasonableness

  • intermediary data and stages

  • neural networks

  • speaking at conferences

Special Guest: Katharine Jarmul.

Sponsored By:

Links: