The Data Skeptic Podcast features interviews and discussion of topics related to data science, stati
Today's episode is a reading of Isaac Asimov's Franchise. As mentioned on the show, this is just a
Classically, entropy is a measure of disorder in a system. From a statistical perspective, it is mor
Cloud services are now ubiquitous in data science and more broadly in technology as well. This week,
Today's episode is all about Causal Impact, a technique for estimating the impact of a particular ev
The Bootstrap is a method of resampling a dataset to possibly refine it's accuracy and produce usefu
The Gini Coefficient (as it relates to decision trees) is one approach to determining the optimal de
Financial analysis techniques for studying numeric, well structured data are very mature. While usin
AdaBoost is a canonical example of the class of AnyBoost algorithms that create ensembles of weak le
Platform as a service is a growing trend in data science where services like fraud analysis and face
For machine learning models created with the random forest algorithm, there is no obvious diagnostic
As cities provide bike sharing services, they must also plan for how to redistribute bicycles as the
Random forest is a popular ensemble learning algorithm which leverages bagging both for sampling and
Jo Hardin joins us this week to discuss the ASA's Election Prediction Contest. This is a competition
The F1 score is a model diagnostic that combines precision and recall to provide a singular evaluati
Urban congestion effects every person living in a city of any reasonable size. Lewis Lehe joins us i
Heteroskedasticity is a term used to describe a relationship between two variables which has unequal
Our guest today is Michael Cuthbert, an associate professor of music at MIT and principal investigat
Paxos is a protocol for arriving a consensus in a distributed computing system which accounts for un
Machine learning models are often criticized for being black boxes. If a human cannot determine why
Analysis of variance is a method used to evaluate differences between the two or more groups. It wo