Skip to main content

ETL Data Tool — Outcome Map

What does success look like?

Desired Outcome

100 NZ businesses ingested, classified, trust-scored, and queryable in under 2 seconds. Every downstream PRD (Sales CRM, Sales Dev, Nowcast, Business Idea Generator) has data flowing through its repos instead of empty schemas.

Outcome Chain

NZBN API returns 100 entities
→ Companies Office adds directors + shareholders
→ Crawl4AI enriches with business model + services
→ Trust scoring validates every record (0-100)
→ PostgreSQL repos loaded via agent-etl-cli
→ Sales Dev agent queries 10 leads with trust > 70 in < 2s

Success Criteria

OutcomeMetricTargetMeasurement
Data existsNZ businesses in PostgreSQL100SELECT count(*) FROM venture_ventures WHERE source = 'nzbn'
Data is trustedRecords with trust score100%No nulls in trust_score column
Data is classifiedANZSIC industry codes assigned95%+Unmapped codes < 5%
Data is queryableQuery latency< 2sP95 measured on indexed queries
Data is consumedQueries per entity per week> 0Kill signal: zero after 14 days

Anti-Outcomes

Bad OutcomeSignalResponse
Extraction theaterData loaded but zero queries after 14 daysStop. Downstream consumers don't need this data.
Garbage inTrust scores cluster below 40Fix source selection or enrichment layer before loading more
Stale dataNo refresh in 30+ daysScheduled extraction not working — Sprint 2 blocked

Context