Austhink is a critical thinking research, training and consulting group specializing in complex reasoning and argumentation. 

 Austhink Home

 

 

FOXES, HEDGEHOGS AND ALGORITHMS

Paul Monk on Philip Tetlock’s study of experts and judgment

“The fox knows many things, but the hedgehog knows one big thing.”

-          Isaiah Berlin

“Hedgehogs remind one…of Churchill’s definition of a fanatic: someone who cannot change his mind and will not change the subject.”

-          Philip Tetlock

“Insisting on anonymity was the only far-sighted thing those characters did.”

-          anonymous skeptic

We rely on experts - academics, intelligence analysts, stock market analysts, strategic planners in business and finance - to forecast the future for us. It has long since been established that stock market analysts in general do little better than random.[i] There is widespread skepticism about intelligence analysts, in the wake of the 9/11 debacle and the WMD fiasco in Iraq. Yet the belief in expertise as such is tenacious. Philip Tetlock has just produced a study which suggests we should start to regard expertise in political forecasting - whether by academics or intelligence analysts, independent pundits, journalists or institutional specialists - with the same skepticism the well-informed now regard stock market forecasting. They are attempting, he argues, to do with confidence what they demonstrably cannot do.

Tetlock is a psychologist who, since completing his PhD at Yale University in 1979, has spent most of his time at the University of California, Berkeley and Ohio State University, exploring the capacity of experts to learn, which is to say to admit errors, and alter their beliefs and assumptions, in the face of evidence. His book, Expert Political Judgment, is the climax of that generation of research, though he has been prolific in the interim. His ongoing research interests are on how experts think about possible pasts (historical counterfactuals) and probable futures (conditional forecasts); how they respond to confirmation/disconfirmation of expectations; and how people in general cope with various types of accountability pressures and demands in their social world.

Tetlock gained tenure in 1984 and, for all those who deplore the downside of the tenure system in our universities, he has to be accounted one of its outstanding success stories. Tenure has given him the stability needed to engage in a long term project with remarkable tenacity and energy. He might, perhaps, have found the means to do it anyway, but twenty years of patient analysis is not something purely commercial interests will normally underwrite. This is especially so, given that it isn’t clear there is a commercial pay-off for the work he has done - just a deeper understanding of how many experts make a good deal of money through making dubious forecasts without any ultimate accountability.

In his Preface, he explains the motive for his long term study of expert political judgment. “I have long been puzzled,” he writes, “by why so many political disagreements - be they on national security or trade or welfare policy - are so intractable. I have long been annoyed by how rarely partisans admit error, even in the face of massive evidence that things did not work out as they once confidently predicted. And I have long wondered what we might learn, if we approached these disputes in a more aggressively scientific spirit.”[ii] It is the scientific spirit with which he tackled his project that is the single most notable thing about his book, but the findings of his inquiry are important and, for both reasons, everyone seriously concerned with forecasting, political risk, strategic analysis and public policy debate would do well to read the book and ponder its lessons.

What was his project and what were his findings? His project dates back to 1984, when, freshly tenured at Berkeley, he was involved in exploring the psychological and strategic dilemmas of the Cold War. “I was struck,” he writes, “by how frequently influential observers offered confident, but flatly contradictory, assessments that were impervious to the arguments that were advanced by the other side.”[iii] This is, clearly, an issue of abiding importance, which afflicted the conscientious centuries before the Cold War. The European wars of religion, in the sixteenth and seventeenth centuries, ended by both Catholics and Protestants signing up to the maxim cuis regio, eius religio (whoever’s territory it is, his religion will rule there), signifying that intellectual exchange had been able to achieve no more than a truce.

Despite the end of the Cold War and the famous announcement of ‘the End of History’ by Frank Fukuyama, we plainly still have an intractable problem in this regard. Just consider the brouhaha over the war in Iraq, to say nothing of the controversies over Islam. One possible ‘answer’ to this problem is to embrace ‘post-modernism’ or some other form of relativism and simply declare that all worldviews are equally valid and that ‘objectivity’ in matters of argument, whether about the past or the future, is a chimera: there are only different perspectives, different interests and irreducible antinomies in how they represent themselves. One of the many beauties of Tetlock’s book is that, while clearly an Enlightenment man and no relativist, he does not merely dismiss this ‘answer’, but goes out of his way to consider its merits.

In an effort to get beyond relativism, he set out to design a research project that would make it possible to rigorously explore why we keep “running into ideological impasses rooted in each other’s insistence on scoring its own performance”. To do that, as he declares “we need to start thinking more deeply about how we think. We need methods of calibrating expert performance that transcend partisan bickering and check our species’ deep-rooted penchant for self-justification.”[iv] Here was a project surely daunting in its complexity and daring in its reach. It is something to make even the ghost of the legendary Stanley Milgram - he of the controversial and disturbing obedience experiments of 1961 - stir and look on with interest.[v] But, whereas Milgram’s bold and unusual experiments failed to gain him the conventional professional recognition he hoped for[vi], Tetlock is a man at the pinnacle of his profession - and this book demonstrates why.

“The goal”, Tetlock tells us, “was to discover how far back we could push the ‘doubting Thomases’ of relativism by asking large numbers of experts large numbers of questions about large numbers of cases and by applying no favoritism scoring rules to their answers. We knew we could never fully escape the interpretive controversies that flourish at the case study level. But we counted on the law of large numbers to cancel out the idiosyncratic case specific causes for forecasting glitches and to reveal the invariant properties of good judgment.”[vii] To that end, he and his research team identified a pool of 284 experts in world politics, from government service, think tanks, academia and international institutions, who had shown themselves “to be remarkably thoughtful and articulate observers of the world scene”[viii] and asked them to make forecasts about world affairs looking out from 1988 to 2003.

The forecasts were numerically weighted by the 284 (anonymous) forecasters, in terms of their own confidence in their predictions; carefully recorded by the research team; then monitored for accuracy, sometimes nearly two decades later. Great effort was expended to make the methodology both systematic and objective and Tetlock’s meticulous explanations of how the data were gathered and interpreted, adjusted and tested is almost breathtaking in its dispassionate lucidity. This is important at two levels. First, because it demonstrates the commitment to the scientific spirit he invokes as the inspiration for the project; second, because it addresses the numerous defences the experts mounted when the results came in and were embarrassing to them - as they often were.

Tetlock’s findings are disconsoling for anyone who believes that expertise confers reliable forecasting powers. They are, however, highly enlightening for anyone seeking to understand how judgment works, where it goes astray and how tenacious experts can be in retrospectively defending their judgments - regardless of what their recorded opinions and the later evidence show to have been the case. Nor does Tetlock shrink from stating just how far off base the forecasters generally were. “The results”, he reports, “plunk human forecasters into an unflattering spot along the performance continuum, distressingly closer to the chimp [throwing darts at random at a board] than to the formal statistical models.”

The depth and longitudinal range of his study gave considerable weight to his findings and buttressed several trenchant and unsettling conclusions: “…it is impossible to find any domain in which humans clearly outperformed crude extrapolation algorithms, still less sophisticated statistical ones…across all judgments, experts on their home turf made neither better calibrated nor more discriminating forecasts than did dilettante trespassers”[ix] and “it made virtually no difference whether participants had doctorates, whether they were economists, political scientists, journalists or historians, whether they had policy experience or access to classified information, or whether they had logged many or few years of experience in their chosen line of work.”[x] And, not least among his findings, “Bad luck proved a vastly more popular explanation for forecasting failure than good luck proved for forecasting success.”[xi]

Not only were field of expertise and depth of experience not correlated with either accuracy of forecasting or well-calibrated self-confidence in judgment, but neither were ideological commitments or worldviews. As Tetlock puts it, “Who experts were - professional background, status and so on - made scarcely an iota of difference to accuracy. Nor did what experts thought - whether they were liberals or conservatives, realists or institutionalists, optimists or pessimists. But the search bore fruit. How experts thought - their style of reasoning - did matter.”[xii] And as regards styles of reasoning, he found it most useful to invoke the old analogy, coined by Isaiah Berlin, of foxes (as sceptical, circumspect thinkers) and hedgehogs (as true believers or ideologues with the ‘courage’ of their convictions) and then allow a continuum in between - of foxhog and hedgefox hybrids.

The truly sobering finding of his project is that, overall, none of the human experts did well at forecasting. “Foxes,” Tetlock found, “are not awe-inspiring forecasters: most of them should be happy to tie simple extrapolation models, and none of them can hold a candle to formal statistical models. But foxes do avoid many of the big mistakes that drive down the probability scores of hedgehogs to approximate parity with dart-throwing chimps. And this accomplishment [modest though it may be] is rooted in foxes’ more balanced style of thinking about the world - a style of thought that elevates no thought above criticism. By contrast, hedgehogs dig themselves into intellectual holes. The deeper they dig, the harder it is to climb out and see what is happening outside…Hedgehogs are thus at continual risk of becoming prisoners of their preconceptions...”[xiii]

Tetlock offers no formulaic answer to the challenges his findings confront us with. He does, however, draw particular attention to the overwhelming statistical finding of his study: that experts tended not to adjust their prior beliefs when the evidence came in, but to rationalize or outright deny their errors in forecasting. A variety of defences were used - challenging the robustness of the research project itself; appealing to an ‘exogenous shock’ as the reason for the future not panning out as they had predicted; invoking what Tetlock calls ‘the close-call counterfactual defence’, which is to say, claiming that they were ‘almost right’; using the ‘just off in the timing’ defence, or the ‘politics is hopelessly cloudlike’ defence; or the ‘I made the right mistake’ defence; or blithely claiming that the ‘low probability outcome just happened to happen’. Often, experts would simply deny that what had been recorded as their forecast had in fact been what they said.[xiv]

All of this, in a way, might be waved away as ‘human nature’, but the carefully assembled research data Tetlock presents us with should, surely, give us greater pause than that. These defences, after all, have important consequences in the world. If they characterize the cognitive behaviour of experts, how can experts themselves demand anything approaching rational belief adjustment among the hoi polloi? In any case, as Tetlock points out, the real gravamen[xv] of his findings is not merely these bad faith defences, but “the pervasiveness of double standards: the tendency to switch on the high intensity search light for flaws only in disagreeable results…It is telling that no-one spontaneously entertained the possibility that ‘I guess the methodological errors broke in my direction this time.’”[xvi]

Now, it is often suggested - and, I confess, I have been one of those who have so suggested - that scenario based thinking can serve as a corrective to feckless forecasting. True to his scientific mission, Tetlock did not leave this stone unturned and his findings are not very reassuring. “The need for such correctives should not be in question,” he observes, “but the scenario experiments show that scenario exercises are not cure-alls. Indeed, the experiments give us grounds for fearing that such exercises will often fail to open the minds of the inclined-to-be-closed-minded hedgehogs, but succeed in confusing the already-inclined-to-be-open-minded foxes.”[xvii] Considering many possibilities and misreading their relative probability, due to the psychological impact of dramatic scenarios whose aggregate plausibility is less than it seems, can make scenario based thinking counter-productive. We are better off, on balance, simply acknowledging our uncertainty and hedging against it.

All of this raises profound epistemological questions as to the degree of certainty we are able to attain and what constitutes credibility in a forecast or even a research finding. Tetlock, to his lasting credit, fully appreciates this. To play it out, he composed a kind of Socratic dialogue toward the end of his book, between four interlocutors: an unrelenting relativist, a hardline neo-positivist, a moderate neo-positivist and a reasonable relativist.[xviii] It is beautifully constructed and well worth reading in its own right, for the sheer intellectual pleasure of observing a first class mind exercising itself by cross-examining the very foundations of its beliefs about knowledge, truth and reality. One is reminded of Plato’s famously demanding dialogue Parmenides or, less strenuously, of David Hume’s lucid and entertaining Dialogues Concerning Natural Religion.

Tetlock himself, at the end of the day - or at the end of his book -  is, by his own account, a reasonable positivist. He believes that scientific methods give us our best chance of avoiding error and overcoming illusion and prejudice. He also allows that, on an everyday basis, we require something more ‘user friendly’ than a statistically driven scientific research program to monitor our thinking. Here, intriguingly, he refers us to Harold Bloom’s reflections on Shakespeare. “The dominant danger”, he concludes, “remains hubris, the mostly hedgehog vice of close-mindedness, of dismissing dissonant possibilities too quickly. But there is also the danger of cognitive chaos, the mostly fox-vice of excessive open-mindedness, of seeing too much merit in too many stories. Good judgment now becomes a metacognitive skill - akin to ‘the art of self-overhearing’.”[xix]

His footnote at this point refers the reader to “Harold Bloom Shakespeare: The Invention of the Human, Riverhead Books, New York, 1998”, with no specific page reference, which is uncharacteristically imprecise. The key passage is, in fact, that in which Bloom wrote of Hamlet overhearing himself speak and changing with every self-overhearing.[xx] Bloom, of course, regards Hamlet as the most supremely realized literary character in history and the very avatar[xxi] of the modern human being, if not the very paragon of animals. Yet he also, in his rather unscientific and flamboyant manner, celebrates Hamlet’s ‘nihilism’ and traces it, with Nietzsche, to Hamlet’s having thought not too much but too well. It is a little difficult to reconcile this with Tetlock’s Enlightenment project, in which our self-overhearing and consequent better thinking would lead to more rational and responsible behaviour. Doubtless, that is why he recommended “something akin to” the self-overhearing of the Prince of Denmark.

At the end of his book, Tetlock comes close to specifying more precisely what he means in this regard. “Good judgment, then, is a precarious balancing act…Executing this balancing act requires cognitive skills of a high order: the capacity to monitor our own thought processes and to strike a reflective equilibrium faithful to our conceptions of the norms of intellectual fair play. We need to cultivate the art of self-overhearing, to learn how to eavesdrop on the mental conversations we have with ourselves as we struggle to strike the right balance between preserving our existing worldview and rethinking core assumptions. This is no easy art to master. If we listen carefully to ourselves, we will often not like what we hear. And we will often be tempted to laugh off the exercise as introspective navel-gazing, as an infinite regress of homunculi spying on each other…all the way down.”

“No doubt such exercises can be taken to excess,” the psychologist concludes philosophically. “But if I had to bet on the best long term predictor of good judgment among the observers [studied in his project] it would be their commitment - their soul-searching Socratic commitment - to thinking about how they think.”[xxii] The problem here, however, is that this is a veritably monastic, or at least Pythagorean[xxiii], demand to make of any individual human being, given the tide of events, the pressures of the marketplace, the constitutive force of the passions, the insistent demands on us for group cohesion and loyalty, the urgencies of our mundane interests, the fears we hold of competitors and predators and of the looming unknown. That Tetlock, the psychologist, should hold to such a pure faith, after everything his study has revealed, is itself slightly unsettling.

He has a more robust suggestion, but one which, for different reasons, as he allows, is likely to find much resistance in the real world. “From a broadly non-partisan perspective,” he reasons, “the situation cries out for remedy. And from the scientific vantage offered by this project, the natural remedy is to apply our performance metrics to actual controversies; to pressure participants in debates - be they passionate partisans or dispassionate analysts - to translate vague claims into testable predictions that can be scored for empirical accuracy and logical defensibility. Of course, the resistance would be fierce, especially with those from the most to lose - those with grand reputations and humble track records.”[xxiv] But, where the stakes are high and we cannot afford to rely on experts keeping their own score cards, perhaps it is time to create serious research and training programs that would generate metrics for market and intelligence analysts and hold them more accountable.

One of Tetlock’s consulting roles in recent years has been in critical analysis of political forecasting and risk assessment techniques for U.S. Government intelligence agencies. Such agencies are among those most commonly pilloried for their failures in forecasting, not least in the past few years, so Tetlock plainly has some good work to do. But as he himself remarks, with characteristic restraint, “The recommendations of this book are much in the spirit of Sherman Kent, after whom the CIA named its training school for intelligence analysts.” It is not, therefore, out of any malicious or self-satisfied glee at the errors of intelligence analysts that Tetlock urges new and ambitious programs in research and training; but out of a resilient belief that we can do better.

Alluding to Sherman Kent’s own reflections of many years ago, he concludes, “We can draw cumulative lessons from experience only if we are aware of gaps between what we expected and what happened, acknowledge the possibility that those gaps signal shortcomings in our understanding and test alternative interpretations of those gaps in even-handed fashion. This means doing what we did here: obtaining explicit probability estimates (not just vague verbiage), eliciting reputational bets that pit rival worldviews against each other, and assessing the consistency of the standards of evidence experts apply to evidence.”[xxv] Getting this done will require specific and tenacious commitment, at a time when resources are heavily committed to analysis and field operations, to analyzing how analysis is done in intelligence work. A good place to start might be for intelligence analysts (and stock brokers and political pundits and academic experts and serious journalists) to read Tetlock closely and take his sober-mindedness, as well as his sobering findings, to heart.


 

[i] The classic study is Burton Malkiel A Random Walk Down Wall Street, Norton, New York, 1999 - first published in 1973. But see, also, John Allen Paulos A Mathematician Plays the Market, Basic Books, New York, 2003. (Penguin 2004).

[ii] Philip E. Tetlock Expert Political Judgment: How Good Is It? How Can We Know? Princeton University Press, Princeton and Oxford, 2005, Preface, p. xi.

[iii] Ibid. p. xiii.

[iv] Ibid. p. 2.

[v] Thomas Blass The Man Who Shocked the World: The Life and Legacy of Stanley Milgram, Creator of the Obedience Experiments and the Father of Six Degrees, Basic Books, New York, 2004, esp. chapters 5-7 and 12.

[vi] This failure, according to Blass, had to do with “the impression he created among some psychologists of a dilettante, who flitted from one newsworthy phenomenon to the next, not staying with any long enough to probe it in adequate depth.” Pp. 259-260.

[vii] Tetlock op. cit. p. 8.

[viii] Ibid. p. 44.

[ix] Ibid. p. 54.

[x]  Ibid p. 68.

[xi] Ibid. p. 22.

[xii] Ibid. p. 20.

[xiii] Ibid. p. 118.

[xiv] Ibid. pp. 129-138.

[xv] “Gravamen: 1. grievance; memorial from Lower House of Convocation to Upper on disorders or grievances of Church. 2. Essence, worst part of, accusation. (Latin = inconvenience, from gravare to load, gravis heavy.” Concise Oxford Dictionary: New Edition. Seventh Impression, 1978.

[xvi] Tetlock op. cit. pp. 160-161.

[xvii] Ibid. p. 199.

[xviii] Ibid. pp. 219-229.

[xix] Ibid. 23.

[xx] Harold Bloom Shakespeare: The Invention of the Human, Riverhead Books, New York, 1998, p. 423.

[xxi]Avatar: (Hindu myth) descent of deity to earth in incarnate form; incarnation, manifestation, phase. Concise Oxford Dictionary: New Edition. Seventh Impression, 1978.

[xxii] Tetlock op. cit. p. 215.

[xxiii] Arnold Hermann To Think Like God: Pythagoras and Parmenides - The Origins of Philosophy, Parmenides Publishing, Las Vegas, 2004, provides an intriguing reconstruction of the world of these great pre-Socratics and their ground breaking work in discerning the nature of proof, contradiction and truth.

[xxiv] Tetlock op. cit. p. 218.

[xxv] Ibid p. 238n.