Welcome to The Knowledge Project. I'm your host, Shane Parrish, editor and chief curator of the Farnham Street Blog, a website with over 70,000 readers dedicated to mastering the best of what other people have already figured out. The Knowledge Project allows me to interview amazing people from around the world to deconstruct why they're good at what they do. It's more conversation than prescription. The Knowledge Project
On this episode, I'm happy to have Philip Tetlock, professor at the University of Pennsylvania. He's the co-leader of the Good Judgment Project, which is a multi-year forecasting study. And he's also the author of the recently released Superforecasting, The Art and Science of Prediction. How we can get better at prediction is the subject of this interview. We're going to dive into what makes some people better and what we can learn to improve our ability to guess the future. I hope you enjoy the conversation as much as I did. ♪
Thank you.
The IKEA Business Network is now open for small businesses and entrepreneurs. Join for free today to get access to interior design services to help you make the most of your workspace, employee well-being benefits to help you and your people grow, and amazing discounts on travel, insurance, and IKEA purchases, deliveries, and more. Take your small business to the next level when you sign up for the IKEA Business Network for free today by searching IKEA Business Network.
I want to talk about your new book, Superforecasting the Art and Science of Prediction that you wrote with Dan Gardner.
who, like me, I think is still based in Ottawa. In the book, you say that we're all forecasters. Can you elaborate on that a little? Well, it's hard to make any decision in life, whether it's a consumer decision about whether to buy a car or a house or whether to marry a particular spouse, potential spouse, or a candidate to vote for in an election. It's very hard to make any decision without forming at least
implicit expectations about what the consequences of that decision will be. So whenever you're making a decision, there are implied probabilities built into that. So the question becomes, are you better off with implicit probabilities that you don't recognize as probabilities,
or explicit ones. I think one of the major takeaways from the forecasting tournaments we've been running is that when people make explicit judgments and they're fully self-conscious about what they're doing, they can learn to do it better. You're talking about the Good Judgment Project. Can you maybe introduce us to that a little?
Sure. Well, the Good Judgment Project is a research program that my wife, Barbara Mellors, and I started several years ago. It was supported by a branch, research and development branch of the U.S. intelligence community known as IARPA, Intelligence Advanced Research Projects Activity, which models itself after DARPA in the Defense Department. And their mandate is to support research that has the potential to revolutionize intelligence analysis.
So working from that mandate, they decided in 2010 to support a series of forecasting tournaments in which major universities would compete, researchers at major universities would compete to generate accurate probability estimates of possible futures of national security relevance.
We were one of the five teams selected for the competition in 2010. The tournaments ran from 2011 to 2015. They ended in June of this year. The Good Judgment Project, I am proud to say, was the winner of those forecasting tournaments. I can explain more about what winning a forecasting tournament means later if you want. Congratulations. Yeah, definitely. Is there a difference between forecasting and predicting?
I don't see one. I think if you go to a thesaurus, I think we're going to find their virtual synonyms. Some people may try to draw distinctions of one sort or another, but I see them essentially as distinctions without a difference. And so were you using a representative subset of the Good Judgment Project, or were you using superforecasters from the project, or how are you competing in that?
Well, different universities and different teams of researchers took different approaches to generating accurate probability estimates. We recruited thousands of forecasters and
we explored a number of different techniques for eliciting the best possible probability estimates from those forecasters. We were continually running experiments. And one of the experiments we conducted was to identify top performers in each year, the top 2% of performers each year, cream them off into teams, elite teams with super teams of super forecasters,
and give them as much support as we could, intellectual support as we could for their task, and see what would happen. And they really went to town. They did a phenomenally good job. They blew the ceiling off all of the performance expectations that are up ahead for what was possible. And frankly, they certainly exceeded my expectations as well.
So some of us are good and some of us are bad, and some of us seem like way off the chart at making predictions. Why are some people so good? That is indeed the $64,000 question. Why are some people so good? So the skeptics argue that if you toss enough coins enough times, some of them are bound to come up heads. So the superforecasters are just super lucky. So let's treat that as kind of the default skeptical hypothesis. There's nothing special about superforecasters.
If we ran a tournament in which the task was to predict whether a fair coin would land heads or tails, some people would do better than others just by chance in a given year. We could anoint those people as super coin toss predictors and we could say, "Well, how are they going to do next year?" What we would find is perfect regression toward the mean. The best prediction is that the super coin toss predictors in year one
will be essentially around the average in year two. And the worst predictors will progress upward toward the mean, of course. So that's what a pure chance environment would look like. What we find in the ARPA tournament is that there certainly is an element of chance in predicting geopolitical and geoeconomic outcomes, but the skill-luck ratio seems to be about 70-30.
So you're not observing a great deal of regression toward the mean among superforecasters, but there inevitably is some regression toward the mean among the top performers.
And so what makes those people so good? Well, now that we've eliminated or at least rendered implausible the super lucky hypothesis, the question becomes what are the attributes these super forecasters have? You might think of them as being stable psychological attributes. Do they score higher on measures of fluid intelligence or crystallized intelligence or active open-mindedness? Do they have certain attitudinal profiles, certain behavioral profiles?
And the answer is all of the above. Superforecasters differ from ordinary mortals in a host of ways. They're not radically different from ordinary mortals, but they are systematically different. They tend to score higher on measures of fluid intelligence. They tend to score higher on measures of active open-mindedness. But if I had to identify one factor that I think best distinguishes superforecasters from other forecasters who are equally intelligent and equally open-minded,
it is that superforecasters believe that probability estimation of real-world events is a skill that can be cultivated and is worth cultivating. And they're willing to make that commitment, that effort. So when people ask me how could the superforecasters have outperformed, say, intelligence analysts who do this
full-time and have access to classified information, I think the short answer is it's not because they're smarter and it's not because they're even more open-minded, although they are pretty open-minded. It's because they are willing to make this commitment, this act of faith that there is a skill that is worth cultivating.
In the book, we quote Aaron Brown, who's the chief risk officer at AQR and also a great poker player. His view is you could distinguish great players from talented amateurs on the basis that great players are good at distinguishing 60-40 bets from 40-60 bets.
And then he paused and says, no, maybe more like 55-45, 45-55. The greatest players tend to be extremely granular in their assessments of uncertainty. One of the big questions I think that IARPA wanted us to answer and that I think we have answered in the affirmative is,
does granularity in assessments of uncertainty pay off not just in poker but in um when you're making messy real world judgments like whether greece is going to leave the eurozone or what kind of mischief putin might be up to in the ukraine next or what's going to happen with sino-japanese relations in the east china sea or there's going to be another outbreak of bird flu in a
given region. These are extremely idiosyncratic, one-shot historical events. It's not like poker where you're sampling from a well-defined sampling universe, repeated play, quick feedback. So there are a lot of people, very smart people, have been skeptical for many decades that it's even possible to make probability estimates of these kinds of
intelligence analytic problems. I think what the IRFB tournament has proven, the unreasonable doubt in my opinion is that there is room for improvement. It's possible to make these probability estimates. It's possible to get better at it. It's possible to identify the kinds of people who learn to do it better. It's possible to develop training modules to help people do it better. The gains in accuracy are appreciable.
So what happened when you took average people and you started giving them, I think I remember this, that you started giving them a course in probability?
For average forecasters who are randomly assigned to an experimental condition in which they get Kahneman-style de-biasing exercises, the improvement is in the vicinity of 10%. That's a big effect when you consider that we're talking about improvement across an entire year of forecasting and this training exercise takes about 50 minutes. What did that consist of, this 50-minute training exercise?
some basic ideas about heuristics and biases and how to check biases. For example, one of the classic Kahneman arguments is that people don't give enough weight to statistical or base rate information in assessing the probabilities of events. They're too quick to take the inside view. So if you're attending a wedding
and you see the happy couple and you're impressed by how much in love they are and the enthusiasm of the moment, and someone asks you how likely are they to get divorced? Uh, you're not likely to consult national divorce statistics for that SES subgroup. Uh, you're likely to say, Hmm, they look really happy and compatible. I'm going to touch a very high probability to they're not getting divorced. Um, and the net result of making predictions in that way is that you're going to be somewhat less accurate than you would have been if you had,
at least started your estimation process by saying what are the base rates of divorce and now I'm going to adjust that based on whatever idiosyncratic factors are present in this particular relationship. So starting with the outside view and working your way inside?
Exactly. Start with the outside and work inside. That's one of our mantras. But isn't Kahneman famous for saying that he's studied biases his whole life and he feels like he's no better at avoiding them? So how does this 50-minute training exercise come in and help people? Well,
You know, Danny Kahneman was a colleague of ours at Berkeley. My wife and I, we know him well. And we know that he is more pessimistic about the prospects for de-biasing than we are. He did give us advice on how to design the de-biasing modules. I think he probably is more of a pessimist still than we are, but I think he is persuaded that these improvements are real. They certainly seem to be.
So one of the keys to keeping track of forecasting and your ability to predict is kind of keeping score.
And do you think it takes a certain type of person to want to keep score? I mean, most of us are happy to kind of weasel out of or use uncertain wording or jargon when we're going about making decisions so that even if we're wrong, we can kind of say, well, that's not what I meant. Absolutely. It does take a particular type of person. And there are many factors that come into play. I think it certainly helps to be open-minded.
But there are other things that come into play that are a little more, say, sociological. I've been doing forecasting tournaments for over 30 years now. And I started when I was about 30 in 1984. I'm 61 years old now. So I'm
If I were an intelligence analyst, a 61-year-old intelligence analyst, I would be a very senior analyst. Let's just say for sake of argument that I am a senior analyst in the U.S. intelligence community. I'm on the National Intelligence Council, say, just for sake of argument, and I
I'm the go-to guy on China. So when Xi Jinping comes into town, people say to me, you know, what's going on? I have inputs into the presidential daily briefing and help with national intelligence estimates. And I'm at the top of the status pecking order within the IC on China. And someone comes along like IARPA, this upstart research and development branch for the Office of Director of National Intelligence. And they say, hey, you know what we're going to do? We want to run forecasting tournaments now. And
Everyone's going to compete on a level playing field. And 25-year-old China analysts are going to compete against 61-year-old analysts like Tetlock. And we're going to see who does better. Are the 61-year-old analysts going to welcome this development?
No. To ask is to answer. Even open-minded 61-year-olds are not going to be very enthusiastic about this. They're going to argue that these tournaments just don't really capture what makes my judgment special. And that is indeed a lot of the resistance we run into for forecasting tournaments. I mean, in the book, you may remember we talk about the parable of two forecasters at the beginning, Tom Friedman and Bill Flack. Mm-hmm.
Almost everybody who reads newspapers knows who Tom Friedman is, famous New York Times columnist, Middle East expert, often in the White House or Davos or God knows where. And Bill Flack, nobody has a faintest idea who he is because he's an anonymous retired irrigation specialist in Nebraska who happens to be a super forecaster. And we know a tremendous amount about Bill Flack's forecasting track record. We know almost nothing about Tom Friedman's
forecasting track record. And that's in substantial part because Tom Friedman's forecasts, and he does make forecasts, are embedded in vague verbiage. He says that this could happen or this might happen. And when you say something could or might happen, that could mean anything from 0.1 to about 0.9 in probability terms. And if it happens, I can say, well, I told you it could. And if it doesn't happen, I can say, look, I merely said it could.
Right, you can't get paid down. You're covered very nicely. Do you think that that's one of the problems with organizations? I mean, it seems like we're not getting better as organizations at making decisions, in part because our ability to keep score is hampered by these psychological kind of effects where if I keep score, I might be wrong, so my incentive is not to. And if I use precise wording, I might be wrong, so my incentive is not to.
Yes. Yeah, I think there's a whole mix. There's a real mixture, a powerful mixture of psychological and political forces that interact to create a lot of resistance to forecasting tournaments. So even though I think we have shown that forecasting tournaments can appreciably improve probability estimates, there are a lot of reasons why organizations don't adopt them. One is the people at the top of the status hierarchy are not very enthusiastic. Bob, who's in the CEO suite, isn't all that enthusiastic about forecasting.
It's being discovered that Bob in the mailroom is just as good as he is at anticipating trends relevant to the company's future. So you have the status hierarchy problem. People at the top don't want to be second-guessed. They don't want their judgment process to be demystified. A large part of status in contemporary organizations is that there's something special about your judgment.
So even open-minded high status people are going to be reluctant to do this because it's going to look like a career damaging move. So there's certainly that. And there's a lot of other factors in play. I mean, there's, again, this Kahneman argument that people don't pay attention to the outside view. In the book, we talk about a mistake that a New York Times, famous New York Times journalist, David Leonhardt, you may know him. He's
He runs the Upshot column in the New York Times. He's a quant-savvy journalist. He made a mistake in 2012 that we talked about that illustrates just how tenacious the misconceptions can be. He was commenting on the Supreme Court decision to uphold Obamacare in 2012.
It was a narrow decision. It was 5-4. He noted that the prediction markets had had futures contracts on this Supreme Court decision, and they were pricing it at about a 75% probability of the law being overturned. Okay, so they were way off. He said, "Well, how far off is way off?" He said, "Well, they got it wrong." He just said flat out, "Got it wrong."
That doesn't account for the complexity, right? That itself is wrong. It certainly isn't good news that the prediction market that it was on the wrong side of maybe by that margin, but prediction markets have generated hundreds of forecasts over many years, and they've proven to be pretty darn well calibrated, which is another way of saying when they say 75% probability of something happening, things happen about 75% of the time, and they don't happen about 25% of the time.
So even if you have a perfectly calibrated prediction market system doing, when it says 75%, 25% of the time, smart observers, observers as smart as David Leonhardt are going to be tempted to conclude that you're wrong and to dismiss you. So this creates a huge political incentive to stick with vague verbiage. If they simply said it could be overturned, you know, they would be well positioned to explain it either way. Um,
but because the prediction market was generating these precise probability estimates and because people don't
take the outside view and say, well, we can't just look at that particular forecast. We have to put it in the context of all these other forecasts that the system is generating, take the outside view toward the system. People have a very hard time doing that. And David Leonhardt knows that this is true. And he's even written later in the upshot about situations in which, you know, about this fallacy. So if someone as smart as that who doesn't have a grudge against prediction markets can make a mistake like that,
You can see why politically savvy intelligence analysts might be reluctant in a blame game culture like DC to do it. Right.
I think one of the most interesting parts of the book for me was when you started talking about the Fermi-style thinking. Can you introduce us to that? Well, Enrico Fermi was an Italian-American physicist who developed the first nuclear reactor at the University of Chicago. He was involved in the development of the atomic bomb in World War II. And he was known for his rather flamboyant thinking style. He was continually coming up with
with innovative ways of estimating the seemingly unestimatable. One of the famous examples of a Fermi problem was, it sounds really weird, it was to estimate the number of piano tuners in Chicago. Other examples might be estimating how much the Empire State Building weighs or estimating the likelihood of extraterrestrial civilizations elsewhere in the Milky Way.
Sounds a lot like the brain teasers that Google used to ask to hire, right? Exactly. Now, I don't know whether Google, whether the legal department still allows Google to continue using those for screening potential personnel, but they are interesting tests of how people approach problems. And what was so interesting about the way that Fermi approached it? Uh,
He really believed in flushing out your ignorance and decomposing the problem into as many tractable components as possible. So you would start by, how many stars are there in the Milky Way? Roughly about 100 billion. You'd say, well, how many of these stars have planets orbiting around them? You might look at the most recent data from Kepler, which has done some reconnaissance in our local area, but
about 60 light years around and, um,
and say, well, it looks like a fair number, a pretty high percentage of stars do seem to have planets going around them. Let's say it could be as much as half or maybe slightly less. I don't really know the answer to that question, but you make initial guesses. You flush out your ignorance. And then other people can come back and they can see that Tetlock said about half, and they say, oh, Tetlock doesn't understand what Kepler is doing.
it should have been seventy percent not even thirty percent uh... but what would be it's not that that lock is getting it right it's that were flushing out catalogs zone of ignorance and we're making it clear and by it's all open and transparent uh... and then we would and in the process of the party would continue how many plants are in the habitable zone and you can direct some further guesstimate from kepler it's a fairly small fraction of planet seem to qualify
for that. But that still might leave you with, say, as many as 500 million to a billion planets that are potentially inhabitable zones. And then you'd have to make some estimate about how likely is life to jumpstart if you have a planet in a habitable zone and how likely is intelligent life to emerge once you have... And there are different evolutionary theorists who have different models that at least have somewhat different implications as answers to those questions.
And what you would wind up with would be ranges of probabilities. Now, for this particular problem, the range of possible probability is going to be very large. We know it's not impossible. There's another advanced extraterrestrial civilization, the Milky Way. We also know it's not a sure thing. It's probably...
My best estimate, if I were to combine all the different steps that we just started to work through, it would be probably more than 1% or 2%, but I don't think it would be as high as 90%. It would probably be between 2% and 50%.
Now that's a guesstimate. Now there's nothing special about that number, but what Tetlock has done now is he's flushed out, Tetlock me, I'm talking about myself, the person here, what the Fermi person, the Fermiizer, the person using the Fermi method has done is he or she has flushed out all the different points of ignorance along the reasoning continuum. And you, the observer, can say, oh look, Tetlock made a really stupid estimate here, and we have to adjust that, and we have to...
but it's a basis for proceeding. What initially looked like a hopelessly intractable problem at least becomes at least a little more tractable. And that's what superforecasters are pretty good at doing, breaking down seemingly intractable problems into semi-tractable components and then just pushing. They're not afraid of looking stupid and making estimates that observers can see and look at and say, oh my God, why did you say something that stupid about the capital project?
That's an incredible point where you're taking this big intractable kind of problem that's very hard to pin down and you determine, you have some organized process for determining the subcomponents involved to get you there. And then you go through and estimate. So part of that would be highlighting your thinking, right? Yes, sir.
And then part of that would be like, I really don't know anything about this question. So can I break that down further into subcomponents or am I extrapolating too much? No, that's exactly the spirit of the enterprise. So why is that style of thinking? Why does it lend itself, do you think, to...
better forecasting? Is it just the nature of the changing the framing of the problem itself? Or do you think it's more the curiosity of the people who are willing to break it down and go through? It sounds like a lot of work. It sounds very demanding and mentally taxing to do that versus just throw out an estimate with your, you know, your immediate response.
You're exactly right. It is demanding and I think it works best if it's done in a team environment in which members of the team have mutual respect for each other, but they're also willing to push each other hard. So if you were an organization and you wanted to set up a team environment, like a forecasting team within a large company, say IBM, how would you go about doing that with your knowledge?
That's a great question. And I'm a little bit wary about saying that organizations should try to construct super teams the way the Good Judgment Project did, because team construction has a lot of implications for other parts of the organization that can be tricky. I mean, imagine that if you just did what we did in
in the IARPA tournament to win it and you just identified the very best people, brought them together and nurtured them and helped them, pushed them hard. That would be a very elitist and somewhat divisive thing to do in many organizations.
Yeah. And it could cause a lot of political friction. Now, we didn't care a lot about that because we were in a forecasting tournament. We didn't really have an organization in the traditional sense of the term. We wanted performance engine. Right. We wanted to harness human ingenuity individually and collectively as rigorously as possible to generate as accurate as possible probability estimates for things that you tell this community cared about. That was it. It was a pure accuracy game. And we...
We weren't that interested in the long-term viability of the organization. We were interested in just pure accuracy. So I would be a little cautious about saying it's really easy. All you do is you recruit these super forecasters and you put them into these teams and you give them some training on how to do precision questioning and you give them some training on how to do constructive confrontation. And you've got these anti-groupthink norms enforced and you give them some training and guidance and probabilistic.
reasoning, you encourage a certain self-critical structure and culture inside the teams, and boom, amazingly accurate forecasts emerge. It works pretty well in the forecasting tournament environment, but whether it would work well in an actual organization, I think the senior executives want to think carefully about each step along the way there.
What would you say to people inside an organization? How can they use your research to make better decisions inside their company? I think it's something you want to consider seriously. When people make forecasts inside most organizations today, accuracy is only one of the goals that they're pursuing.
They're also interested in making forecasts that are going to be difficult to falsify. So they can't be embarrassed. So a lot of the forecasting inside organizations doesn't involve numbers. It involves a lot of vague verbiage. They're also interested in making forecasts that don't annoy other people in the organization.
They don't want to tip the political apple cart over. So they're compromising accuracy in a whole host of ways that help promote their careers inside the organization, help to maintain political stability in the organization, but they aren't all that centrally focused on accuracy. Forecasting tournaments are really weird because they focus 100% on accuracy. That's all that matters.
So I guess the thing you'd want to consider as an executive would be, do I want to reserve part of my organization's analytical processing capacity for a pure accuracy game? Do I want to incentivize some small group of the people in my organization to play pure accuracy games in forecasting tournaments and those probability estimates would then filter up to senior executives to guide decision making?
I think it's really an interesting experiment to consider doing. I think the intelligence community has been moving somewhat in that direction. I think it's a good idea. I think it would probably be a good idea for many other entities as well, at least to consider. It's in the spirit of the whole IARP enterprise is to run experiments. What I would propose would be that senior execs would consider running experiments in which they see what do they discover when they incentivize people to play pure accuracy games.
And do you think what transfers from your research into the decision making process in a corporation, not necessarily about forecasting, but about how we go about organizing, unpacking, synthesizing multiple views? How does that transfer, do you think, into a learnable skill that people can have inside of an organization?
There are many ways that could happen. We put a lot of emphasis in the Good Judgment Project on synthesizing diverse views into aggregate forecasts. And I think one of our major performance engines was the statistical or aggregation algorithms that our statisticians developed for doing that. When IARPA started this whole exercise, they thought it would be really hard to do better than 20 or 30 or 40 percent better than the unweighted average of the control group forecasters.
And our super forecasters exceeded that performance benchmark quite substantially each year of the tournament. They did so well that IARPA essentially suspended the tournament after two years, and we were able to absorb the other teams into our team in substantial ways and compete against the intelligence community and against the prediction market baselines instead of the other universities. Now, how did all that happen?
come to pass. I think the aggregation algorithm developed, if I had to credit two big things as responsible for the victory of the Good Judgment Project, one of them would be the super forecasters and the other would be, call them super algorithms, the great algorithms that our statisticians develop. Now, when I describe these algorithms, at some level you're not going to be too surprised at first, but
but there is one aspect of them that does surprise most people. So the first thing to do, so I don't know if your listeners are familiar with the James Surwicky Wisdom of the Crowd book, but it's been well known. It's
It's been well known in the forecasting world that the average of a group of forecasters, the average forecast from those forecasters is going to be more accurate than most of the individuals from whom the average was derived. And this is the famous Galton story about the ox. You had hundreds of people trying to guess the weight of the ox and the average of all those guesses was only about one or two pounds off from the true weight of the ox.
and that means it was more accurate than all of the individuals from whom the average was derived. So averaging is a powerful way of synthesizing information from diverse perspectives. It's a remarkably crude approach to doing it, but it works pretty darn well, and that's why IARPA used it as its benchmark. Now, we were able to beat averaging by doing some simple things, like giving more weight to better forecasters.
as we get more and more data on who the good forecasters were, who the more intelligent forecasters were, who the more frequent belief updaters were, various attributes of forecasters, we are able to give more weight to certain forecasters and we create weighted averages. Weighted averages beat the average. That's not too surprising, is it? It makes sense. It's not astonishing though.
Now, here's the interesting thing that the algorithms did. They did something called extremizing. And to illustrate extremizing, I want to
Just to have a little digression of a story that we do talk about in the book about the decision President Obama made to go after Osama bin Laden. In the movie Zero to Dark Thirty, they have a scene in which senior analysts are being polled on how likely they think it is that Osama bin Laden is in that position.
is in that compound and putting aside what Hollywood says about it let's just do a little thought experiment and imagine that you're the president of the United States and you have these senior advisors around the table and you ask them how likely is it that Osama is there and each of the analysts around the table says do you miss the president I think the answer is 0.7 0.7 0.7 everybody around the table says 0.7 what should the president conclude is the likelihood that Osama bin Laden is in that compound and
And the short answer to that is, well, if the advisors are all clones of each other and they're drawing on exactly the same information and processing it in exactly the same way, the answer is 0.7 because there's no information added, right?
But imagine that the analysts say 0.7 all around the table, but the analysts don't know each other and they haven't been sharing information. And each analyst bases his or her 0.7 judgment on information that only he or she has.
So you have extreme diversity of perspectives. One person has satellite information, another has encryption breaking stuff, another one has human intelligence and so forth. But they're siloized and they're coming together for the first time and each one has independently arrived at this 0.7 estimate from very different sources of information. You've got true diversity here.
And is the answer still 0.7? Should the president say shrug and say, well, I think the answer is 0.7? Or should the president say, gee, each of you has very different reasons for believing 0.7. This leads me to suppose that the answer is probably more extreme than 0.7 because if each of you knew the reasons the others had, you would probably become more extreme. And that's exactly what the best algorithm did. It extremized as a function of diversity.
So 0.7 was turned into 0.85 or 0.9. That's fascinating. I mean, how did it go about doing that in terms of aggregating the data from the people or from the forecasters? That's right, from the forecasters. And what would happen if you had two forecasters who have great track records and then they're divergent on, they're really divergent on an opinion or a forecast?
Does that happen often? No, it doesn't happen very often actually, but if it did happen, it would be a real cautionary moment. If you had two superforecasters, one of whom was at 0.9, the other was at 0.1, my inclination would be not to stray too far from 0.5, knowing nothing else at the moment. Are there certain types of questions to avoid if your desire is to have an accurate prediction? Yes.
There are many questions in the IARPA tournament, there are many questions in life in which there's a massive amount of irreducible uncertainty. If you want to be a good forecaster, you don't spend very much time working on roulette wheel type problems. If you visit casinos, you'll find lots of people who think they can detect patterns in roulette wheel spins.
and they develop little algorithms even to help them. But what they're doing is they're essentially modeling randomness. So spending a lot of time modeling randomness is a good way not to become a superforecaster. What other types of questions would you say don't lend themselves to... Is it like a time duration? Is it... Oh, what other kinds of questions are roulette wheel-like?
Well, not roulette wheel, but what type of questions lend themselves to better predictions, right? Is it short time, very few? I mean, I don't want to say very few variables, but short time duration versus long time duration? Because you have to constantly update over a long period of time, right? I mean, that was one of the things that superforecasters did was they updated their... Yes, that's true.
Well, all of the things equal, it's usually easier to predict questions with shorter time ranges than longer time ranges, but that's not always true. I mean, some short-range questions are extremely unpredictable. It's very hard to say whether the stock market is going to go up or down tomorrow. So that's a short-range question. In some ways, it's easier to predict whether the stock market is going to be up or down 10 years from now relative to now than it is tomorrow, right? That's a good point.
So there are categories of problems in which you get a reversal of that. But yes, I think by and large it's true that the analogy to vision would be it's easier to see the snow on an eye chart if you're close to it than if you're far from it. Probabilistic foresight is better in shorter time ranges.
That's one of the things I talk about in the book, one of the reasons why my later work is different in emphasis from my earlier work, in which experts had a hard time beating the dark-throwing chimpanzee, because they were, in the earlier work, making much longer-term predictions than they were in the IARPA work, where the predictions were rarely much more than a year long.
You mentioned open-mindedness at the beginning. How do we go about fostering open-mindedness? Are there ways that we can improve that in ourselves or other people? That's another thing we do try to emphasize in the training. Exerting people simply to be open-minded is... Most people don't think they're closed-minded.
Most people think they're quite reasonable. And simply exhorting people to be open-minded, people shrug and say, well, yeah, I already am. I think you want to start in a more specific way. So you want to start with very specific problems in which you assess whether people change their minds in an appropriate way.
so there are some normative models like Bayes theorem to tell you how much you should change your mind in response to evidence that has certain diagnostic value and you can create simulated problems maybe medical diagnosis problems it might be economic problems they might be military problems but you can create simulated problem with simulated data and you can see whether people learn to practice to update their beliefs the way they should now there's always a question whether they're gonna
those lessons are going to stick. And we found that they do stick a little bit because they can produce 10% improvement throughout the year. But it's one of the great challenges. I don't think we've solved the problem of how to make people more open-minded. I think we can make people better belief updaters on problems
where they don't have very strong ideological priors or preconceptions. But when people have really strong emotions and ideological convictions about presidential candidates or economic policy or whatnot, belief updating becomes quite problematic. Yeah, I mean, I can see why that would be a problem, right? It contradicts probably something that you hold very dear and true. Giving that up would take a lot of
What a mental labor. Yeah, we can make people a bit more open-minded, but making people perfect Bayesian belief updaters is something that no one has achieved yet, and I think will be very difficult to achieve. I think we should keep working on it. I don't think we should give up. Do you think the superforecasters were better at learning from the other superforecasters than the, say, average forecaster? Like if somebody had a better approach, would they copy it? Would they...
just drop their own internal approach? I think they listen to each other quite carefully in the superforecaster teams. Even when they disagree with each other, they disagree diplomatically, but they can disagree quite forcefully about what lessons they should draw from particular forecasting failures or even forecasting successes. It's fairly common for regular forecasters even to say, "Well, what did we do wrong with the forecasting failure?"
And supers do that too. But they also second guess their successes.
They say, "Well, were we lucky? We really nailed this question, but were we lucky? Could it have gone otherwise? Were we almost wrong?" That's an unusual question for people to ask themselves. People don't normally look a gift horse in the mouth. When they're right, they want to take credit for it. Superforecaster skepticism even extends to their
their forecasting successes. I can't imagine a lot of the average or below average in terms of forecasting ability people went through their successes and evaluated them from that angle. What would you say is the role of intuition in forecasting? Or would you say that it's minimized? Or would you say that it's...
This is one of the big debates in the field of judgment and decision-making. Malcolm Gladwell wrote a book called Blink, and some psychologists wrote a rejoinder book, much less widely read, called Think. There are different schools of thought about the value of intuition. Even Gladwell, of course, has devised it in his book. He did point to some great successes of intuition, but he also noted the situations in which intuition could lead you seriously astray.
I think the dominant emphasis in our work leans toward think over blink. I'm not ruling out the possibility that there are
super forecasters who do rely on intuition. But the problems that we're dealing with in real world are different from the sorts of problems where brilliant intuition has been demonstrated pretty rigorously. So it's not like chess where you're playing the same game with well-defined rules.
Right, the pattern recognition. Really smart people can do extremely rapid forms of combinatorics and pattern recognition, and it's quite astonishing what they can do. Real world isn't quite like chess, is it? And I think that it requires more subtlety and more willingness to second-guess yourself because history, I think it was Mark Twain who said, history doesn't repeat itself, but it does rhyme.
And I think superforecasters sort of get that, that there are patterns in history, but they're quite subtle and they're quite conditional. And you can easily overlearn from history. That's a really good point. What book would you say has had the most impact on your life?
On my life. On your life. That would have to be a book I read very early on in my life. Oh, possibly, yeah. Yeah. I think, well, I don't know how far back we should go on this one. I mean, if I were to go back to graduate school, say, when I was making decisions about what I would do with my research career,
there was a book by Robert Jervis who still, I think he's an, he's maybe an emeritus professor now at Columbia, but he's a very senior political scientist. He wrote a wonderful book in 1976 that I was in graduate school. I just started in graduate school in 1976. And it's called perception and misperception in international politics. And it,
It is a wonderful synthesis of psychology and political science. I think it is a synthesis of the sort that I've aspired to. I've tried to be Jervisian in my work in many ways. Now, Jervis is not a quantitative researcher. He's qualitative, whereas I'm more quantitative. So we differ in a number of ways. But I have a deep respect for how he was trying to synthesize the psychological and the political processes.
And I suppose if there's any theme that's running through my work, it's synthesizing the psychological and the political.
So the last question is, who would you like to see interviewed on the show and their thoughts articulated or explored with me? Well, I've always been a fan of Michael Lewis's work. I think he would be a fun person to talk to. And I think he may be working on a biography of Daniel Kahneman and Amos Tversky. I think that would be an interesting conversation.
Well, excellent. Thank you so much, Phil, for taking the time. I really appreciate it. It's been a great conversation. Oh, it's a pleasure. Hey, guys, this is Shane again. Just a few more things before we wrap up.
You can find show notes at farnamstreetblog.com slash podcast. That's F-A-R-N-A-M-S-T-R-E-E-T-B-L-O-G dot com slash podcast. You can also find information there on how to get a transcript. And if you'd like to receive a weekly email from me filled with all sorts of brain food, go to farnamstreetblog.com slash newsletter. This is all the good stuff I've found on the web that week that I've read and shared with close friends, books I'm reading, and so much more.
Thank you for listening.