Skip to main content

ETL Data Tool — Outcome Map

What does success look like?

Desired Outcome

100 NZ businesses ingested, classified, trust-scored, and queryable in under 2 seconds. Every downstream PRD (Sales CRM, Sales Dev, Nowcast, Business Idea Generator) has data flowing through its repos instead of empty schemas.

Outcome Chain

NZBN API returns 100 entities
→ Companies Office adds directors + shareholders
→ Crawl4AI enriches with business model + services
→ Trust scoring validates every record (0-100)
→ PostgreSQL repos loaded via agent-etl-cli
→ Sales Dev agent queries 10 leads with trust > 70 in < 2s

Success Criteria

OutcomeMetricTargetMeasurement
Data existsNZ businesses in PostgreSQL100SELECT count(*) FROM venture_ventures WHERE source = 'nzbn'
Data is trustedRecords with trust score100%No nulls in trust_score column
Data is classifiedANZSIC industry codes assigned95%+Unmapped codes < 5%
Data is queryableQuery latency< 2sP95 measured on indexed queries
Data is consumedQueries per entity per week> 0Kill signal: zero after 14 days

Anti-Outcomes

Bad OutcomeSignalResponse
Extraction theaterData loaded but zero queries after 14 daysStop. Downstream consumers don't need this data.
Garbage inTrust scores cluster below 40Fix source selection or enrichment layer before loading more
Stale dataNo refresh in 30+ daysScheduled extraction not working — Sprint 2 blocked

Context

Questions

Which outcome in the etl data pipelines map is the leading indicator — the one that reliably predicts all others?

  • Which outcome is hardest to measure but most important to track — and what proxy metric comes closest?
  • If the primary outcome were achieved but secondary outcomes lagged, what would that signal about the system?
  • At what threshold does each outcome shift from "in progress" to "proven" — and who decides that threshold?