We're sunsetting PodQuest on 2025-07-28. Thank you for your support!
Export Podcast Subscriptions
cover of episode The pioneers of proof

The pioneers of proof

2025/4/19
logo of podcast More or Less: Behind the Stats

More or Less: Behind the Stats

AI Deep Dive AI Chapters Transcript
People
T
Tim Harford
Topics
Tim Harford: 我主持的More or Less节目致力于数据核实,但证明本身却是一个难以捉摸的概念。有时数据不足,有时像主观意见,有时需要实际检验。Adam Kucharski的新书《Proof: The Uncertain Science of Certainty》探讨了证明的先驱者们的故事,从二战期间盟军预测德军坦克数量到Janet Lane Claypon对乳腺癌风险因素的研究,都展现了统计方法在解决复杂问题中的强大力量。 我们每天都使用已知的证明工具来回答听众的问题,但这些工具是谁创造的?他们又是如何证明未知的呢?本书探讨了这些问题,并介绍了一些重要的证明先驱者。 Adam Kucharski: (虽然访谈中Adam Kucharski没有直接发言,但他的研究成果贯穿了整个节目) 我的研究表明,即使数据碎片化,运用合适的统计方法,也能得出可靠的结论。例如,在二战期间,盟军通过分析缴获的德军坦克零件上的序列号,成功预测了德军坦克的产量,这与我们现在在疫情期间利用有限数据估计感染人数的方法异曲同工。这种方法的关键在于理解数据背后的机制,并结合统计学原理进行推断。 Janet Lane Claypon: (虽然访谈中Janet Lane Claypon没有直接发言,但她的研究成果是节目的重点) 我对母乳喂养和牛奶喂养对婴儿健康的影响进行了研究,并率先使用了回顾性队列研究的方法。通过分析已有的数据,我比较了不同喂养方式对婴儿生长发育的影响,并考虑了家庭收入等混杂因素。此外,我还进行了乳腺癌风险因素的研究,并使用了病例对照研究的方法,发现了生育子女数量与乳腺癌风险之间的关联。我的研究方法至今仍被广泛应用于医学和流行病学研究。

Deep Dive

Chapters
This chapter discusses how statisticians used a small sample of captured German tanks and their serial numbers to estimate the total number of tanks produced, which was significantly lower than initial intelligence estimates. This method proved surprisingly accurate compared to the actual number of tanks.
  • Allied statisticians used serial numbers on tank components to estimate production.
  • The estimate of 270 tanks per month was remarkably close to the actual figure of 276.
  • This method highlighted the value of even limited data in making accurate estimations.

Shownotes Transcript

Translations:
中文

This BBC podcast is supported by ads outside the UK. When you have bars in the sky, onboard showers and award-winning in-flight entertainment, it's no surprise that Emirates was recently named the best airline in the world. We fly you to over 140 destinations and with partners across the globe, we connect you to another 1,700 cities across six continents. So when we say we're also the largest international airline, what we really mean is...

If you're going there, so are we. Book now on Emirates.com. Fly Emirates. Fly better. Toyota is the best resale value brand for 2025, according to Kelley Blue Books, KBB.com. And with a wide range of dependable vehicles for any lifestyle,

Hello, and thank you for downloading the More or Less podcast.

We're the program that delights in data, marvels at maths and swoons over statistics. And as ever, I'm Tim Harford. Here at More or Less we are constantly fact-checking wild claims. But proof can be a strangely difficult beast to capture. Sometimes you don't have enough data, sometimes it seems to be a matter of opinion, and sometimes, just sometimes, the proof of the pudding is in the eating.

Every day we use a toolkit of known proofs to try and answer our listeners' questions. But who do we have to thank for this toolkit? And how did they set about proving the unknown? Luckily for me, Adam Krucharski has just written a book about this very topic called Proof, the Uncertain Science of Certainty. Adam is a mathematician and professor of epidemiology at the London School of Hygiene and Tropical Medicine.

I sat down with him to hear more about some of the proof pioneers included in his book. We start in the lead-up to D-Day in 1944. The Allies were trying to predict what the Germans might have waiting for them in occupied France, and in particular a fearsome new tank called the Panther Mark V.

just how many of these tanks had the German economy been able to produce. And the Allies had only managed to get hold of two captured tanks. So the British and the Americans have got one, the Russians have got another. Intelligence reports estimated some 1,500 of these tanks were being built every month.

the statisticians weren't so sure. They started to take them to pieces and they noticed that the little wheels that keep the tracks in place have these rubber tyres on them. There's 24 on each side and then looking at these rubber tyres they realised that each tyre has a serial number. So they start to get a sense of, well can this tell us how many tank moulds the manufacturers have and that gives us something about their manufacturing capacity and we can use that to estimate.

What they did is they had, for one manufacturer, about 20 serial numbers and the highest was 77. So I think everyone intuitively could feel there's probably not thousands of these serial numbers. Yeah.

And using this method for essentially estimating if we've observed 20 and the biggest is 77, they worked out it's probably about 80 moulds that manufacture. If you've only seen one, then maybe there are 150, but we're really not sure. Exactly. You see more and more and they're all less than 77. It becomes less plausible that suddenly there's loads of high numbers that you just by chance haven't observed.

And what they did were using this method and using their kind of understanding the manufacturing process is estimated that the Germans were probably making about 270 a month in early 1944.

The statisticians assumed that the tank serial numbers started at 1 and counted up from there. With 20 serial numbers, none of which are higher than 77, they combined this with information about production rates from British tank manufacturers, used a bit of statistical intuition, and came out with the estimate of 270 tanks a month. That was way below the intelligence estimate of 1500 a month. Had they predicted too low?

And as it happened on D-Day, they ended up facing a large chunk, about 40% of the tanks they faced were Pampers. And later, when they found out the actual manufacturing numbers, the real numbers are 276. Wow. So they estimated 270, the real numbers 276. They basically nailed it. Just from looking at tyres. As we all know, D-Day was a success and a pivotal moment during the war.

Some 76 years later, Adam found himself predicting numbers in a high-pressure situation using very similar methods. There was a few situations when we were working on COVID as epidemiologists trying to understand data where we might have had fragments of data and only some observations on cases or tests, for example,

And looking at that, it struck me, this is just a version of the German tank problem. We have this unknown total and we have some fragmented observations that are drawn somewhat random. You've got a few cases, you've got a few tests, you've got some positive tests and negative tests. You're trying to work out how many people are infected or how accurate the test is. And actually for a very quick rough ballpark, it's just a really simple calculation that can get you there. And then at least it informs the scale of what you might be dealing with and what sort of follow-up you want to do. News you can use.

Our next proof pioneer is a scientist responsible for two of my absolutely favourite things, cohort studies and case control studies. Born in rural Lincolnshire on the 3rd February 1877, Janet Lane Claypon grew up in a wealthy family. Originally homeschooled, she was always noted as being very bright.

In 1899, she moved to London and gained a first-class degree from University College London. She then got her PhD in physiology and a medical doctorate in 1910. Indeed, very, very bright. She became particularly interested in child health. And one of the questions was the nutritional benefits of breast milk versus, at the time, boiled cow's milk.

But she couldn't find in England the right kind of data set at the time. So she went to Berlin and across a series of clinics in working class areas

from data that had been already collected, pieced together different groups of children that had been fed breast milk and cow's milk, followed them over time in the data sets and looked at what happened to them. And this is a method we still use today in a lot of health studies. It's known as a retrospective cohort because what you're doing is looking backwards and then you're identifying these groups that then you can reconstruct what happens to them subsequently. So a really kind of powerful idea for getting what you needed to explore this problem.

Obviously, the ideal situation would be a large-scale, randomised, controlled trial. However, the first properly controlled medical trial wasn't conducted until the late 1940s.

In any case, one of the benefits of a retrospective cohort trial meant that she was able to get answers very quickly as instead of observing what happens in the present, you're looking back at what has already happened. So one of the things she was particularly interested in is just how much they grew over time as a measure of how much nutrition they were getting. And from an initial look at the data, it seemed that children who fed breast milk were growing more than those who were fed boiled cow's milk.

she realised there were some limitations here. So one of them was maybe there's another factor that's influencing both what they're given to eat and their health and their wellbeing. So she thought maybe family income, although it's a similar area, there might be just different factors

of wealth that influence both the probability you give them breast milk versus cow's milk and just their overall health over time. And so she adjusted for this. She said, OK, let's account for that difference between the groups as a fair comparison. There's still that difference, even if you account for those differences in income. And nowadays, we call this a confounder in statistics. So a confounder is some factor that influences both the thing you're exposed to, in this case, diet, and your outcome, in this case, growth.

After her research into mother's milk versus moo cow milk, she was commissioned by the early medical council to look at the risk factors associated with breast cancer. Again, because breast cancer is one of the things that develops over a very long period of time, these can be quite rare events in a population.

And she wanted an answer faster. What she did was, in London and Glasgow, look at people who developed cancer, looked at about 500 people, and then looked at 500 so-called controls that had attended hospital but for other reasons. So very similar individuals by age and other characteristics, but they didn't have cancer. And then looked at what might be in the history of these individuals that might tell you something different. And one of the things that jumped out was differences in the number of children that they'd had. Yeah.

that particularly the women who'd had cancer generally had fewer children. And again, confounders are a potential issue here. So she accounted for things like their age, how long they'd been married, but again found this signal between the data sets. And we now call this a case control study. Some of the things she discovered about risk factors for breast cancer are still cited today. And epidemiologists still use case control studies to understand differences in risk.

Thanks to Adam Kucharski, author of Proof, and to all the proof pioneers of the past who've all made our lives that little bit more predictable in the best of ways. That is all we have time for this week, but please do keep your questions and comments coming in to moreorless at bbc.co.uk. We will be back next week, and until then, goodbye.

When you have bars in the sky, onboard showers and award-winning in-flight entertainment, it's no surprise that Emirates was recently named the best airline in the world. We fly you to over 140 destinations and with partners across the globe, we connect you to another 1,700 cities across six continents. So when we say we're also the largest international airline, what we really mean is...

If you're going there, so are we. Book now on Emirates.com. Fly Emirates. Fly better. Toyota is the best resale value brand for 2025, according to Kelley Blue Books, KBB.com. And with a wide range of dependable vehicles for any lifestyle,

You can get everything you need in a vehicle today while investing in tomorrow. So choose Toyota and choose value. Shop via toyota.com for great deals and more. Vehicles projected resale value is specific to the 2025 model year. For more information, visit kellybluebookskbb.com. Kelly Blue Book is a registered trademark of Kelly Blue Book Co. Inc. Toyota, let's go places.