Superforecaster
What separates the top 2% of forecasters from everyone else?
Not intelligence. Not access. Superforecasters decompose better, update faster, and track everything. Philip Tetlock's research proved it: ordinary people who follow the right process consistently outperform credentialed experts who don't.
The Ten Commandments
Tetlock's rules, compressed. Each maps to a cognitive trap it prevents.
| # | Commandment | Trap Prevented |
|---|---|---|
| 1 | Triage — Focus on questions where effort improves accuracy | Wasting calibration on the unknowable |
| 2 | Decompose — Break every question into sub-components | Gut-feel masquerading as analysis |
| 3 | Outside View — Start with the base rate before adjusting | Anchoring to the vivid, ignoring the statistical |
| 4 | Inside View — Then adjust for what makes this case unique | Ignoring specifics that matter |
| 5 | Synthesize — Combine outside and inside views deliberately | Defaulting to one lens |
| 6 | Update — Change probabilities when evidence changes | Belief persistence, ego protection |
| 7 | Balance — Not too much, not too little revision | Overreaction to noise or underreaction to signal |
| 8 | Hunt Errors — Actively seek what would prove you wrong | Confirmation bias |
| 9 | Team — Use disagreement as signal, not threat | Groupthink |
| 10 | Balance Again — Confidence and humility in equal measure | Overconfidence or paralysis |
The master principle: perpetual beta. Every belief is a hypothesis under test.
Decomposition
The superforecaster's primary weapon. Fermi estimation applied to the future.
The three-view protocol:
OUTSIDE VIEW (base rate)
"How often does this type of thing happen?"
Historical frequency, reference class, statistical default.
+
INSIDE VIEW (domain signals)
"What makes this case different?"
Current evidence, unique factors, acceleration/deceleration signals.
=
SYNTHESIS (calibrated probability)
"Given both views, what's my probability estimate?"
Not a gut feel. A number. With reasoning attached.
Example decomposition:
"Will AI agents replace 40% of knowledge worker tasks by end 2027?"
| View | Evidence | Adjustment |
|---|---|---|
| Outside | Historical automation waves displaced 20-30% of targeted tasks within 5 years of maturity. Base rate: 25% | Starting point: 25% |
| Inside | AI coding assistants already handle 30-50% of junior programming tasks (2025 data). Enterprise adoption at 65%+. Capability curve steeper than prior automation waves. | Adjust upward: +20% |
| Synthesis | Outside view says 25%. Inside view says this wave is faster. Split the difference with slight upward lean. | Probability: 70% |
The discipline: never skip the outside view. The temptation is always to jump to domain expertise. The base rate grounds you.
State-of-the-World Protocol
A repeatable process for producing a multi-year forecast. Run quarterly or when major signals shift.
Step 1: Define Domains
Choose 6-8 domains that cover your decision space. Too few and you miss interaction effects. Too many and you lose depth.
| Criterion | Good Domain | Bad Domain |
|---|---|---|
| Actionable | You make decisions affected by this | Interesting but irrelevant |
| Observable | You can track signals | Pure speculation |
| Bounded | Clear enough to decompose | "Everything about the economy" |
| Connected | Interacts with other domains | Isolated curiosity |
Step 2: Per Domain — Base Rate + Signals + Trend
For each domain, fill this structure:
| Field | What It Contains | Why |
|---|---|---|
| Base Rate | Historical precedent for this type of change at this speed | Grounds the outside view |
| Current Signals | 3-5 concrete, sourced, date-stamped data points | Evidence, not narrative |
| Trend Direction | Accelerating / Steady / Decelerating / Reversing | Trajectory matters more than position |
Signals must be facts, not interpretations. "Enterprise AI adoption at 67% (Gartner, Oct 2025)" is a signal. "AI is taking over" is not.
Step 3: Per Domain — Prediction + Probability
For each domain, write predictions that pass the SMART-BF test:
- Specific — One measurable outcome
- Measurable — Clear resolution criteria
- Assignable — Who resolves it
- Realistic — Within the plausible range
- Time-bound — Resolution date
- Base-rated — Outside view stated
- Falsifiable — What would prove it wrong
Assign both:
- Probability (0-100%) — Calibration instrument. A 70% prediction should be right 70% of the time.
- Conviction (1-5) — Action instrument. How much would you bet? Maps to the existing priority system.
These are different instruments. Probability measures your calibration. Conviction measures your willingness to act.
Step 4: Falsifying Conditions + Watch Signals
For each prediction:
| Field | Question |
|---|---|
| Falsifying conditions | What evidence in 6 months would lower your conviction by 2+ points? |
| Watch signals | What specific data do you check in weekly/monthly reviews? |
| Update triggers | At what threshold do you revise the probability? |
This is where most forecasters fail. They make predictions but never define what would change their mind. Without falsifying conditions, a prediction is a belief, not a hypothesis.
Step 5: Cross-Domain Interactions
The highest-value predictions live at domain intersections. Map compounding effects:
Domain A ──→ Domain B
↑ │
└──────────────┘
Reinforcing loop = acceleration
Ask: where do domains amplify each other? Where do they cancel? The interaction effects are where you find the predictions nobody else is making.
Step 6: Calibration Check
Compare your new predictions against your existing prediction database:
- Do any new predictions contradict existing ones? If so, one must update.
- Is your probability distribution realistic? (All 90%+ predictions = overconfidence. All 40-60% = hedging.)
- Plot your predictions on a calibration curve. If you have history, check past accuracy.
Calibration
The measure of a forecaster is not whether individual predictions are right. It is whether their probability estimates are well-calibrated over time.
| Calibration | Meaning |
|---|---|
| Perfect | 70% of your 70% predictions come true |
| Overconfident | 50% of your 70% predictions come true |
| Underconfident | 90% of your 70% predictions come true |
| Uninformative | All predictions cluster around 50% |
Track with Brier scores. Lower is better. 0 = perfect foresight. 0.25 = coin flip. Below 0.2 = good forecaster.
The feedback loop: predict, track, score, adjust process, predict again. This is the VVFL applied to belief.
Anti-Patterns
| Trap | Symptom | Fix |
|---|---|---|
| Narrative bias | Predictions read like a story with a protagonist | Strip to data points and probabilities |
| Hedgehog thinking | One big idea explains everything | Force yourself to name three competing explanations |
| Recency bias | Last week's news dominates the forecast | Always start with the base rate, not the headline |
| Precision theater | "73.2% probability" with no calibration history | Round to nearest 5% until you have 50+ tracked predictions |
| Update failure | Predictions unchanged for 6+ months | Set calendar reminders for review cadence |
Context
- Forecasting — The principles (backward/forward reasoning, discipline framework)
- Probability — Bayesian updating mechanics
- Evaluation — SMART-BF scoring, Brier scores, calibration tracking
- Process — The review cadence (daily, weekly, monthly, quarterly)
- Prediction Database — The living record of all predictions
Links
- Philip Tetlock — Superforecasting (TED) — The research that proved ordinary people can outpredict experts
- Good Judgment Project — The platform that operationalized superforecasting
- Farnam Street — Ten Commandments — Tetlock's rules, expanded
- ai-2027.com — AI trajectory modeling with explicit assumptions
- Metaculus — Community prediction platform for calibration practice
- Prophet Arena — Engineer a superforecaster
Questions
If decomposition is the superforecaster's primary weapon, which of your current predictions has never been decomposed into sub-questions?
- When your outside view (base rate) and inside view (domain signals) conflict sharply, what decision rule do you use to weight them — and has that rule ever been tested?
- What is the minimum number of tracked predictions required before your calibration curve becomes meaningful?
- If you could only track one domain for the next two years, which would give you the highest decision-relevant signal — and what does that reveal about where your uncertainty actually lives?