ETL Data Tool — Outcome Map
What does success look like?
Desired Outcome
100 NZ businesses ingested, classified, trust-scored, and queryable in under 2 seconds. Every downstream PRD (Sales CRM, Sales Dev, Nowcast, Business Idea Generator) has data flowing through its repos instead of empty schemas.
Outcome Chain
NZBN API returns 100 entities
→ Companies Office adds directors + shareholders
→ Crawl4AI enriches with business model + services
→ Trust scoring validates every record (0-100)
→ PostgreSQL repos loaded via agent-etl-cli
→ Sales Dev agent queries 10 leads with trust > 70 in < 2s
Success Criteria
| Outcome | Metric | Target | Measurement |
|---|---|---|---|
| Data exists | NZ businesses in PostgreSQL | 100 | SELECT count(*) FROM venture_ventures WHERE source = 'nzbn' |
| Data is trusted | Records with trust score | 100% | No nulls in trust_score column |
| Data is classified | ANZSIC industry codes assigned | 95%+ | Unmapped codes < 5% |
| Data is queryable | Query latency | < 2s | P95 measured on indexed queries |
| Data is consumed | Queries per entity per week | > 0 | Kill signal: zero after 14 days |
Anti-Outcomes
| Bad Outcome | Signal | Response |
|---|---|---|
| Extraction theater | Data loaded but zero queries after 14 days | Stop. Downstream consumers don't need this data. |
| Garbage in | Trust scores cluster below 40 | Fix source selection or enrichment layer before loading more |
| Stale data | No refresh in 30+ days | Scheduled extraction not working — Sprint 2 blocked |
Context
- Value Stream Map — Where does time die in the pipeline?
- ETL Spec — Build contract and quality targets