Skip to main content

Superforecaster

What separates the top 2% of forecasters from everyone else?

Not intelligence. Not access. Superforecasters decompose better, update faster, and track everything. Philip Tetlock's research proved it: ordinary people who follow the right process consistently outperform credentialed experts who don't.

The Ten Commandments

Tetlock's rules, compressed. Each maps to a cognitive trap it prevents.

#CommandmentTrap Prevented
1Triage — Focus on questions where effort improves accuracyWasting calibration on the unknowable
2Decompose — Break every question into sub-componentsGut-feel masquerading as analysis
3Outside View — Start with the base rate before adjustingAnchoring to the vivid, ignoring the statistical
4Inside View — Then adjust for what makes this case uniqueIgnoring specifics that matter
5Synthesize — Combine outside and inside views deliberatelyDefaulting to one lens
6Update — Change probabilities when evidence changesBelief persistence, ego protection
7Balance — Not too much, not too little revisionOverreaction to noise or underreaction to signal
8Hunt Errors — Actively seek what would prove you wrongConfirmation bias
9Team — Use disagreement as signal, not threatGroupthink
10Balance Again — Confidence and humility in equal measureOverconfidence or paralysis

The master principle: perpetual beta. Every belief is a hypothesis under test.

Decomposition

The superforecaster's primary weapon. Fermi estimation applied to the future.

The three-view protocol:

OUTSIDE VIEW (base rate)
"How often does this type of thing happen?"
Historical frequency, reference class, statistical default.

+

INSIDE VIEW (domain signals)
"What makes this case different?"
Current evidence, unique factors, acceleration/deceleration signals.

=

SYNTHESIS (calibrated probability)
"Given both views, what's my probability estimate?"
Not a gut feel. A number. With reasoning attached.

Example decomposition:

"Will AI agents replace 40% of knowledge worker tasks by end 2027?"

ViewEvidenceAdjustment
OutsideHistorical automation waves displaced 20-30% of targeted tasks within 5 years of maturity. Base rate: 25%Starting point: 25%
InsideAI coding assistants already handle 30-50% of junior programming tasks (2025 data). Enterprise adoption at 65%+. Capability curve steeper than prior automation waves.Adjust upward: +20%
SynthesisOutside view says 25%. Inside view says this wave is faster. Split the difference with slight upward lean.Probability: 70%

The discipline: never skip the outside view. The temptation is always to jump to domain expertise. The base rate grounds you.

State-of-the-World Protocol

A repeatable process for producing a multi-year forecast. Run quarterly or when major signals shift.

Step 1: Define Domains

Choose 6-8 domains that cover your decision space. Too few and you miss interaction effects. Too many and you lose depth.

CriterionGood DomainBad Domain
ActionableYou make decisions affected by thisInteresting but irrelevant
ObservableYou can track signalsPure speculation
BoundedClear enough to decompose"Everything about the economy"
ConnectedInteracts with other domainsIsolated curiosity

Step 2: Per Domain — Base Rate + Signals + Trend

For each domain, fill this structure:

FieldWhat It ContainsWhy
Base RateHistorical precedent for this type of change at this speedGrounds the outside view
Current Signals3-5 concrete, sourced, date-stamped data pointsEvidence, not narrative
Trend DirectionAccelerating / Steady / Decelerating / ReversingTrajectory matters more than position

Signals must be facts, not interpretations. "Enterprise AI adoption at 67% (Gartner, Oct 2025)" is a signal. "AI is taking over" is not.

Step 3: Per Domain — Prediction + Probability

For each domain, write predictions that pass the SMART-BF test:

  • Specific — One measurable outcome
  • Measurable — Clear resolution criteria
  • Assignable — Who resolves it
  • Realistic — Within the plausible range
  • Time-bound — Resolution date
  • Base-rated — Outside view stated
  • Falsifiable — What would prove it wrong

Assign both:

  • Probability (0-100%) — Calibration instrument. A 70% prediction should be right 70% of the time.
  • Conviction (1-5) — Action instrument. How much would you bet? Maps to the existing priority system.

These are different instruments. Probability measures your calibration. Conviction measures your willingness to act.

Step 4: Falsifying Conditions + Watch Signals

For each prediction:

FieldQuestion
Falsifying conditionsWhat evidence in 6 months would lower your conviction by 2+ points?
Watch signalsWhat specific data do you check in weekly/monthly reviews?
Update triggersAt what threshold do you revise the probability?

This is where most forecasters fail. They make predictions but never define what would change their mind. Without falsifying conditions, a prediction is a belief, not a hypothesis.

Step 5: Cross-Domain Interactions

The highest-value predictions live at domain intersections. Map compounding effects:

Domain A  ──→  Domain B
↑ │
└──────────────┘
Reinforcing loop = acceleration

Ask: where do domains amplify each other? Where do they cancel? The interaction effects are where you find the predictions nobody else is making.

Step 6: Calibration Check

Compare your new predictions against your existing prediction database:

  • Do any new predictions contradict existing ones? If so, one must update.
  • Is your probability distribution realistic? (All 90%+ predictions = overconfidence. All 40-60% = hedging.)
  • Plot your predictions on a calibration curve. If you have history, check past accuracy.

Calibration

The measure of a forecaster is not whether individual predictions are right. It is whether their probability estimates are well-calibrated over time.

CalibrationMeaning
Perfect70% of your 70% predictions come true
Overconfident50% of your 70% predictions come true
Underconfident90% of your 70% predictions come true
UninformativeAll predictions cluster around 50%

Track with Brier scores. Lower is better. 0 = perfect foresight. 0.25 = coin flip. Below 0.2 = good forecaster.

The feedback loop: predict, track, score, adjust process, predict again. This is the VVFL applied to belief.

Anti-Patterns

TrapSymptomFix
Narrative biasPredictions read like a story with a protagonistStrip to data points and probabilities
Hedgehog thinkingOne big idea explains everythingForce yourself to name three competing explanations
Recency biasLast week's news dominates the forecastAlways start with the base rate, not the headline
Precision theater"73.2% probability" with no calibration historyRound to nearest 5% until you have 50+ tracked predictions
Update failurePredictions unchanged for 6+ monthsSet calendar reminders for review cadence

Context

  • Forecasting — The principles (backward/forward reasoning, discipline framework)
  • Probability — Bayesian updating mechanics
  • Evaluation — SMART-BF scoring, Brier scores, calibration tracking
  • Process — The review cadence (daily, weekly, monthly, quarterly)
  • Prediction Database — The living record of all predictions

Questions

If decomposition is the superforecaster's primary weapon, which of your current predictions has never been decomposed into sub-questions?

  • When your outside view (base rate) and inside view (domain signals) conflict sharply, what decision rule do you use to weight them — and has that rule ever been tested?
  • What is the minimum number of tracked predictions required before your calibration curve becomes meaningful?
  • If you could only track one domain for the next two years, which would give you the highest decision-relevant signal — and what does that reveal about where your uncertainty actually lives?