Skip to main content

Prediction Evaluation

Is this prediction worth tracking?

Not all predictions deserve attention. This checklist separates signal from noise by scoring prediction quality before you invest time tracking it.

The SMART-BF Checklist

Seven dimensions, scored 0–2 each. Total: 0–14 points. Every cell intersection carries meaning — this is a genuine scoring rubric, not a layout table.

DimensionQuestion012
SpecificIs it precise and unambiguous?Vague ("AI will change things")Somewhat specificPrecise ("GDP-Val over 90% by Dec 2026")
MeasurableCan we objectively verify resolution?No clear verificationPartially measurableBinary yes/no with clear criteria
ActionableDoes it enable positioning decisions?Entertainment onlyIndirect implicationsDirect action if true/false
ResolutionIs there a clear time horizon?No timeframeVague ("soon", "eventually")Specific date or trigger event
TestableWhat would prove it wrong?UnfalsifiableWeak falsification criteriaClear falsifying conditions
Base rateIs there historical precedent?No analogous historyWeak analogiesStrong base rate available
FactoredDoes it depend on other predictions?Many hidden dependenciesSome dependencies acknowledgedIndependent or dependencies explicit

Scoring Guide

  • 10–14 — Excellent — Track actively, assign conviction, position
  • 7–9 — Good — Track, but note quality gaps
  • 4–6 — Marginal — Improve specificity before tracking
  • 0–3 — Poor — Don't track — reframe or discard

Quality → Conviction Mapping

High-quality prediction is not the same as high-conviction prediction.

  • Quality — how well-formed is the prediction itself?
  • Conviction — how likely you think it is to occur?

A prediction can score 14/14 on quality ("Bitcoin hits 200K USD by Dec 31, 2026") while you have low conviction (1/5) it will happen.

  • Quality 10–14 — full conviction range eligible (0–5)
  • Quality 7–9 — cap at 4/5 (quality uncertainty constrains the bet)
  • Quality 4–6 — cap at 3/5 (prediction itself is unclear)
  • Quality 0–3 — don't assign conviction at all

Worked Example

Prediction: "AI solves at least one Clay Millennium Prize math problem in 2026"

DimensionScoreReasoning
Specific2Clear outcome (one of 7 named problems)
Measurable2Clay Institute verification process exists
Actionable1Indirect positioning implications
Resolution2"In 2026" = by Dec 31, 2026
Testable2No solution announced = falsified
Base rate1No prior AI math proof at this level
Factored1Depends on AI capability trajectory

Total: 11/14 — Excellent quality, worth tracking.

Conviction assignment: 3/5 (uncertain on timeline, confident on direction)

Common Quality Failures

Vague predictions (low Specificity)

  • "AI will transform business" → Better: "50% of Fortune 500 will have AI-native divisions by 2027"
  • "Crypto will go mainstream" → Better: "US spot Bitcoin ETFs exceed 100B USD AUM by Dec 2026"

Unfalsifiable predictions (low Testability)

  • "We're in the early innings of AI" → Better: "Frontier Math Tier 4 exceeds 40% by Dec 2026"
  • "The future belongs to builders" → Better: "Single-founder billion-dollar startup emerges by 2027"

Missing base rates (low Base rate)

  • "AGI by 2027" → Add: "Based on GPT-2 to GPT-4 capability doubling timeline of around 2 years"
  • "10x efficiency gains" → Add: "Manufacturing automation precedent: 8 to 12x over 20 years"

Hidden dependencies (low Factored)

  • "Level-5 autonomy deployed in 2026" → Add: "Depends on regulatory approval, liability framework, OEM adoption"

The Inversion Test

Before scoring, ask: What would make this prediction worse?

If the answer includes any of these patterns, you have already named the quality gap:

  • "Be more specific" → Specificity problem
  • "Define success" → Measurability problem
  • "Pick a date" → Resolution problem
  • "Acknowledge what could prove it wrong" → Testability problem

Using This Checklist

  1. Before adding to the live forecast — score quality first
  2. When reviewing others' predictions — apply checklist before forming conviction
  3. When your conviction changes — check whether quality score also changed; new information may mean reframing the prediction, not just adjusting probability

Context

Questions

Which aspect of this topic compounds most over a 10-year horizon when practiced consistently versus ignored?

  • At what level of mastery does this topic shift from requiring deliberate effort to becoming an automatic advantage?
  • How does this topic change when the context shifts from individual practice to organizational culture?
  • Which assumption about this topic is most commonly held that, if examined, would change how you approach it?