|
FOXES, HEDGEHOGS AND ALGORITHMS
Paul Monk
on Philip Tetlock’s study of experts and judgment
“The fox
knows many things, but the hedgehog knows one big thing.”
-
Isaiah Berlin
“Hedgehogs
remind one…of Churchill’s definition of a fanatic: someone who cannot change
his mind and will not change the subject.”
-
Philip Tetlock
“Insisting
on anonymity was the only far-sighted thing those characters did.”
-
anonymous skeptic
We rely on experts –
academics, intelligence analysts, stock market analysts, strategic planners
in business and finance – to forecast the future for us. It has long since
been established that stock market analysts in general do little better than
random.[i]
There is widespread skepticism about intelligence analysts, in the wake of
the 9/11 debacle and the WMD fiasco in Iraq. Yet the belief in expertise as
such is tenacious. Philip Tetlock has just produced a study which suggests
we should start to regard expertise in political forecasting – whether by
academics or intelligence analysts, independent pundits, journalists or
institutional specialists - with the same skepticism the well-informed now
regard stock market forecasting. They are attempting, he argues, to do with
confidence what they demonstrably cannot do.
Tetlock is a psychologist
who, since completing his PhD at Yale University in 1979, has spent most of
his time at the University of California, Berkeley and Ohio State
University, exploring the capacity of experts to learn, which is to say to
admit errors, and alter their beliefs and assumptions, in the face of
evidence. His book, Expert Political Judgment, is the climax of that
generation of research, though he has been prolific in the interim. His
ongoing research interests are on how experts think about
possible pasts (historical counterfactuals) and probable futures
(conditional forecasts); how they respond to confirmation/disconfirmation of
expectations; and how people in general cope with various types of
accountability pressures and demands in their social world.
Tetlock gained tenure in
1984 and, for all those who deplore the downside of the tenure system in our
universities, he has to be accounted one of its outstanding success stories.
Tenure has given him the stability needed to engage in a long term project
with remarkable tenacity and energy. He might, perhaps, have found the means
to do it anyway, but twenty years of patient analysis is not something
purely commercial interests will normally underwrite. This is especially so,
given that it isn’t clear there is a commercial pay-off for the work he has
done – just a deeper understanding of how many experts make a good deal of
money through making dubious forecasts without any ultimate accountability.
In his Preface, he explains
the motive for his long term study of expert political judgment. “I have
long been puzzled,” he writes, “by why so many political disagreements – be
they on national security or trade or welfare policy – are so intractable. I
have long been annoyed by how rarely partisans admit error, even in the face
of massive evidence that things did not work out as they once confidently
predicted. And I have long wondered what we might learn, if we approached
these disputes in a more aggressively scientific spirit.”[ii]
It is the scientific spirit with which he tackled his project that is the
single most notable thing about his book, but the findings of his inquiry
are important and, for both reasons, everyone seriously concerned with
forecasting, political risk, strategic analysis and public policy debate
would do well to read the book and ponder its lessons.
What was his project and
what were his findings? His project dates back to 1984, when, freshly
tenured at Berkeley, he was involved in exploring the psychological and
strategic dilemmas of the Cold War. “I was struck,” he writes, “by how
frequently influential observers offered confident, but flatly
contradictory, assessments that were impervious to the arguments that were
advanced by the other side.”[iii]
This is, clearly, an issue of abiding importance, which afflicted the
conscientious centuries before the Cold War. The European wars of religion,
in the sixteenth and seventeenth centuries, ended by both Catholics and
Protestants signing up to the maxim cuis regio, eius religio
(whoever’s territory it is, his religion will rule there), signifying that
intellectual exchange had been able to achieve no more than a truce.
Despite the end of the Cold
War and the famous announcement of ‘the End of History’ by Frank Fukuyama,
we plainly still have an intractable problem in this regard. Just consider
the brouhaha over the war in Iraq, to say nothing of the controversies over
Islam. One possible ‘answer’ to this problem is to embrace ‘post-modernism’
or some other form of relativism and simply declare that all worldviews are
equally valid and that ‘objectivity’ in matters of argument, whether about
the past or the future, is a chimera: there are only different perspectives,
different interests and irreducible antinomies in how they represent
themselves. One of the many beauties of Tetlock’s book is that, while
clearly an Enlightenment man and no relativist, he does not merely dismiss
this ‘answer’, but goes out of his way to consider its merits.
In an effort to get beyond
relativism, he set out to design a research project that would make it
possible to rigorously explore why we keep “running into ideological
impasses rooted in each other’s insistence on scoring its own performance”.
To do that, as he declares “we need to start thinking more deeply about how
we think. We need methods of calibrating expert performance that transcend
partisan bickering and check our species’ deep-rooted penchant for self-justification.”[iv]
Here was a project surely daunting in its complexity and daring in its
reach. It is something to make even the ghost of the legendary Stanley
Milgram – he of the controversial and disturbing obedience experiments of
1961 - stir and look on with interest.[v]
But, whereas Milgram’s bold and unusual experiments failed to gain him the
conventional professional recognition he hoped for[vi],
Tetlock is a man at the pinnacle of his profession – and this book
demonstrates why.
“The goal”, Tetlock tells
us, “was to discover how far back we could push the ‘doubting Thomases’ of
relativism by asking large numbers of experts large numbers of questions
about large numbers of cases and by applying no favoritism scoring rules to
their answers. We knew we could never fully escape the interpretive
controversies that flourish at the case study level. But we counted on the
law of large numbers to cancel out the idiosyncratic case specific causes
for forecasting glitches and to reveal the invariant properties of good
judgment.”[vii]
To that end, he and his research team identified a pool of 284 experts in
world politics, from government service, think tanks, academia and
international institutions, who had shown themselves “to be remarkably
thoughtful and articulate observers of the world scene”[viii]
and asked them to make forecasts about world affairs looking out from 1988
to 2003.
The forecasts were
numerically weighted by the 284 (anonymous) forecasters, in terms of their
own confidence in their predictions; carefully recorded by the research
team; then monitored for accuracy, sometimes nearly two decades later. Great
effort was expended to make the methodology both systematic and objective
and Tetlock’s meticulous explanations of how the data were gathered and
interpreted, adjusted and tested is almost breathtaking in its dispassionate
lucidity. This is important at two levels. First, because it demonstrates
the commitment to the scientific spirit he invokes as the inspiration for
the project; second, because it addresses the numerous defences the experts
mounted when the results came in and were embarrassing to them – as they
often were.
Tetlock’s findings are
disconsoling for anyone who believes that expertise confers reliable
forecasting powers. They are, however, highly enlightening for anyone
seeking to understand how judgment works, where it goes astray and how
tenacious experts can be in retrospectively defending their judgments -
regardless of what their recorded opinions and the later evidence show to
have been the case. Nor does Tetlock shrink from stating just how far off
base the forecasters generally were. “The results”, he reports, “plunk human
forecasters into an unflattering spot along the performance continuum,
distressingly closer to the chimp [throwing darts at random at a board] than
to the formal statistical models.”
The depth and longitudinal
range of his study gave considerable weight to his findings and buttressed
several trenchant and unsettling conclusions: “…it is impossible to find any
domain in which humans clearly outperformed crude extrapolation algorithms,
still less sophisticated statistical ones…across all judgments, experts on
their home turf made neither better calibrated nor more discriminating
forecasts than did dilettante trespassers”[ix]
and “it made virtually no difference whether participants had doctorates,
whether they were economists, political scientists, journalists or
historians, whether they had policy experience or access to classified
information, or whether they had logged many or few years of experience in
their chosen line of work.”[x]
And, not least among his findings, “Bad luck proved a vastly more popular
explanation for forecasting failure than good luck proved for forecasting
success.”[xi]
Not only were field of
expertise and depth of experience not correlated with either accuracy of
forecasting or well-calibrated self-confidence in judgment, but neither were
ideological commitments or worldviews. As Tetlock puts it, “Who
experts were – professional background, status and so on – made scarcely an
iota of difference to accuracy. Nor did what experts thought –
whether they were liberals or conservatives, realists or institutionalists,
optimists or pessimists. But the search bore fruit. How experts
thought – their style of reasoning – did matter.”[xii]
And as regards styles of reasoning, he found it most useful to invoke the
old analogy, coined by Isaiah Berlin, of foxes (as sceptical, circumspect
thinkers) and hedgehogs (as true believers or ideologues with the ‘courage’
of their convictions) and then allow a continuum in between – of foxhog and
hedgefox hybrids.
The truly sobering finding
of his project is that, overall, none of the human experts did well at
forecasting. “Foxes,” Tetlock found, “are not awe-inspiring forecasters:
most of them should be happy to tie simple extrapolation models, and none of
them can hold a candle to formal statistical models. But foxes do avoid many
of the big mistakes that drive down the probability scores of hedgehogs to
approximate parity with dart-throwing chimps. And this accomplishment
[modest though it may be] is rooted in foxes’ more balanced style of
thinking about the world – a style of thought that elevates no thought above
criticism. By contrast, hedgehogs dig themselves into intellectual holes.
The deeper they dig, the harder it is to climb out and see what is happening
outside…Hedgehogs are thus at continual risk of becoming prisoners of their
preconceptions...”[xiii]
Tetlock offers no formulaic
answer to the challenges his findings confront us with. He does, however,
draw particular attention to the overwhelming statistical finding of his
study: that experts tended not to adjust their prior beliefs when the
evidence came in, but to rationalize or outright deny their errors in
forecasting. A variety of defences were used – challenging the robustness of
the research project itself; appealing to an ‘exogenous shock’ as the reason
for the future not panning out as they had predicted; invoking what Tetlock
calls ‘the close-call counterfactual defence’, which is to say, claiming
that they were ‘almost right’; using the ‘just off in the timing’ defence,
or the ‘politics is hopelessly cloudlike’ defence; or the ‘I made the right
mistake’ defence; or blithely claiming that the ‘low probability outcome
just happened to happen’. Often, experts would simply deny that what had
been recorded as their forecast had in fact been what they said.[xiv]
All of this, in a way, might
be waved away as ‘human nature’, but the carefully assembled research data
Tetlock presents us with should, surely, give us greater pause than that.
These defences, after all, have important consequences in the world. If they
characterize the cognitive behaviour of experts, how can experts themselves
demand anything approaching rational belief adjustment among the hoi
polloi? In any case, as Tetlock points out, the real gravamen[xv]
of his findings is not merely these bad faith defences, but “the
pervasiveness of double standards: the tendency to switch on the high
intensity search light for flaws only in disagreeable results…It is telling
that no-one spontaneously entertained the possibility that ‘I guess the
methodological errors broke in my direction this time.’”[xvi]
Now, it is often suggested –
and, I confess, I have been one of those who have so suggested – that
scenario based thinking can serve as a corrective to feckless forecasting.
True to his scientific mission, Tetlock did not leave this stone unturned
and his findings are not very reassuring. “The need for such correctives
should not be in question,” he observes, “but the scenario experiments show
that scenario exercises are not cure-alls. Indeed, the experiments give us
grounds for fearing that such exercises will often fail to open the minds of
the inclined-to-be-closed-minded hedgehogs, but succeed in confusing the
already-inclined-to-be-open-minded foxes.”[xvii]
Considering many possibilities and misreading their relative probability,
due to the psychological impact of dramatic scenarios whose aggregate
plausibility is less than it seems, can make scenario based thinking
counter-productive. We are better off, on balance, simply acknowledging our
uncertainty and hedging against it.
All of this raises profound
epistemological questions as to the degree of certainty we are able to
attain and what constitutes credibility in a forecast or even a research
finding. Tetlock, to his lasting credit, fully appreciates this. To play it
out, he composed a kind of Socratic dialogue toward the end of his book,
between four interlocutors: an unrelenting relativist, a hardline
neo-positivist, a moderate neo-positivist and a reasonable relativist.[xviii]
It is beautifully constructed and well worth reading in its own right, for
the sheer intellectual pleasure of observing a first class mind exercising
itself by cross-examining the very foundations of its beliefs about
knowledge, truth and reality. One is reminded of Plato’s famously demanding
dialogue Parmenides or, less strenuously, of David Hume’s lucid and
entertaining Dialogues Concerning Natural Religion.
Tetlock himself, at the end
of the day – or at the end of his book - is, by his own account, a
reasonable positivist. He believes that scientific methods give us our best
chance of avoiding error and overcoming illusion and prejudice. He also
allows that, on an everyday basis, we require something more ‘user friendly’
than a statistically driven scientific research program to monitor our
thinking. Here, intriguingly, he refers us to Harold Bloom’s reflections on
Shakespeare. “The dominant danger”, he concludes, “remains hubris, the
mostly hedgehog vice of close-mindedness, of dismissing dissonant
possibilities too quickly. But there is also the danger of cognitive chaos,
the mostly fox-vice of excessive open-mindedness, of seeing too much merit
in too many stories. Good judgment now becomes a metacognitive skill – akin
to ‘the art of self-overhearing’.”[xix]
His footnote at this point
refers the reader to “Harold Bloom Shakespeare: The Invention of the
Human, Riverhead Books, New York, 1998”, with no specific page
reference, which is uncharacteristically imprecise. The key passage is, in
fact, that in which Bloom wrote of Hamlet overhearing himself speak and
changing with every self-overhearing.[xx]
Bloom, of course, regards Hamlet as the most supremely realized literary
character in history and the very avatar[xxi]
of the modern human being, if not the very paragon of animals. Yet he also,
in his rather unscientific and flamboyant manner, celebrates Hamlet’s
‘nihilism’ and traces it, with Nietzsche, to Hamlet’s having thought not too
much but too well. It is a little difficult to reconcile this with Tetlock’s
Enlightenment project, in which our self-overhearing and consequent better
thinking would lead to more rational and responsible behaviour. Doubtless,
that is why he recommended “something akin to” the self-overhearing of the
Prince of Denmark.
At the end of his book,
Tetlock comes close to specifying more precisely what he means in this
regard. “Good judgment, then, is a precarious balancing act…Executing this
balancing act requires cognitive skills of a high order: the capacity to
monitor our own thought processes and to strike a reflective equilibrium
faithful to our conceptions of the norms of intellectual fair play. We need
to cultivate the art of self-overhearing, to learn how to eavesdrop on the
mental conversations we have with ourselves as we struggle to strike the
right balance between preserving our existing worldview and rethinking core
assumptions. This is no easy art to master. If we listen carefully to
ourselves, we will often not like what we hear. And we will often be tempted
to laugh off the exercise as introspective navel-gazing, as an infinite
regress of homunculi spying on each other…all the way down.”
“No doubt such exercises can
be taken to excess,” the psychologist concludes philosophically. “But if I
had to bet on the best long term predictor of good judgment among the
observers [studied in his project] it would be their commitment – their
soul-searching Socratic commitment – to thinking about how they think.”[xxii]
The problem here, however, is that this is a veritably monastic, or at least
Pythagorean[xxiii],
demand to make of any individual human being, given the tide of events, the
pressures of the marketplace, the constitutive force of the passions, the
insistent demands on us for group cohesion and loyalty, the urgencies of our
mundane interests, the fears we hold of competitors and predators and of the
looming unknown. That Tetlock, the psychologist, should hold to such a pure
faith, after everything his study has revealed, is itself slightly
unsettling.
He has a more robust
suggestion, but one which, for different reasons, as he allows, is likely to
find much resistance in the real world. “From a broadly non-partisan
perspective,” he reasons, “the situation cries out for remedy. And from the
scientific vantage offered by this project, the natural remedy is to apply
our performance metrics to actual controversies; to pressure participants in
debates – be they passionate partisans or dispassionate analysts – to
translate vague claims into testable predictions that can be scored for
empirical accuracy and logical defensibility. Of course, the resistance
would be fierce, especially with those from the most to lose – those with
grand reputations and humble track records.”[xxiv]
But, where the stakes are high and we cannot afford to rely on experts
keeping their own score cards, perhaps it is time to create serious research
and training programs that would generate metrics for market and
intelligence analysts and hold them more accountable.
One of Tetlock’s consulting
roles in recent years has been in critical analysis of
political forecasting and risk assessment techniques for U.S. Government
intelligence agencies. Such agencies are among those most commonly pilloried
for their failures in forecasting, not least in the past few years, so
Tetlock plainly has some good work to do. But as he himself remarks, with
characteristic restraint, “The recommendations of this book are much in the
spirit of Sherman Kent, after whom the CIA named its training school for
intelligence analysts.” It is not, therefore, out of any malicious or
self-satisfied glee at the errors of intelligence analysts that Tetlock
urges new and ambitious programs in research and training; but out of a
resilient belief that we can do better.
Alluding
to Sherman Kent’s own reflections of many years ago, he concludes, “We can
draw cumulative lessons from experience only if we are aware of gaps between
what we expected and what happened, acknowledge the possibility that those
gaps signal shortcomings in our understanding and test alternative
interpretations of those gaps in even-handed fashion. This means doing what
we did here: obtaining explicit probability estimates (not just vague
verbiage), eliciting reputational bets that pit rival worldviews against
each other, and assessing the consistency of the standards of evidence
experts apply to evidence.”[xxv]
Getting this done will require specific and tenacious commitment, at
a time when resources are heavily committed to analysis and field
operations, to analyzing how analysis is done in intelligence work. A good
place to start might be for intelligence analysts (and stock brokers and
political pundits and academic experts and serious journalists) to read
Tetlock closely and take his sober-mindedness, as well as his sobering
findings, to heart.
[i] The classic study is Burton
Malkiel A Random Walk Down Wall Street, Norton, New York,
1999 – first published in 1973. But see, also, John Allen Paulos
A Mathematician Plays the Market, Basic Books, New York, 2003.
(Penguin 2004).
[ii] Philip E. Tetlock
Expert Political Judgment: How Good Is It? How Can We Know?
Princeton University Press, Princeton and Oxford, 2005, Preface, p.
xi.
[v] Thomas Blass The
Man Who Shocked the World: The Life and Legacy of Stanley Milgram,
Creator of the Obedience Experiments and the Father of Six Degrees,
Basic Books, New York, 2004, esp. chapters 5-7 and 12.
[vi] This failure,
according to Blass, had to do with “the impression he created among
some psychologists of a dilettante, who flitted from one newsworthy
phenomenon to the next, not staying with any long enough to probe it
in adequate depth.” Pp. 259-260.
[vii] Tetlock op. cit.
p. 8.
[xv] “Gravamen: 1. grievance;
memorial from Lower House of Convocation to Upper on disorders or
grievances of Church. 2. Essence, worst part of, accusation. (Latin
= inconvenience, from gravare to load, gravis heavy.”
Concise Oxford Dictionary: New Edition. Seventh Impression,
1978.
[xvi] Tetlock op. cit. pp.
160-161.
[xx] Harold Bloom
Shakespeare: The Invention of the Human, Riverhead Books, New
York, 1998, p. 423.
[xxi] “Avatar: (Hindu
myth) descent of deity to earth in incarnate form; incarnation,
manifestation, phase. Concise Oxford
Dictionary: New Edition. Seventh
Impression, 1978.
[xxii] Tetlock op. cit.
p. 215.
[xxiii] Arnold Hermann
To Think Like God: Pythagoras and Parmenides – The Origins of
Philosophy, Parmenides Publishing, Las Vegas, 2004, provides an
intriguing reconstruction of the world of these great pre-Socratics
and their ground breaking work in discerning the nature of proof,
contradiction and truth.
[xxiv] Tetlock op. cit.
p. 218.
|