ETL Data Tool — Dependency Map
What must happen first?
External Dependencies
| Dependency | Type | Risk | Mitigation |
|---|---|---|---|
| NZBN API subscription key | API key (free) | Low — registration only | Apply via nzbn.govt.nz |
| Companies Office API access | API key (free) | Low — registration only | Apply via companiesoffice.govt.nz |
| Docker (for Crawl4AI) | Runtime | Low — already installed | Self-hosted, Apache 2.0 |
| ANZSIC06 classification tables | Reference data | None — publicly available | Download from data.govt.nz |
Single external gate: NZBN API key. Everything else is open-source or publicly available.
Internal Dependencies
PostgreSQL schemas (73 types) ──── EXIST
agent-etl-cli.ts ────────────── ── EXIST
Drizzle repositories ───────────── EXIST
ETL pipeline dashboard ─────────── EXIST (8 pipelines)
File/API connectors ────────────── EXIST (85-90%)
NZBN API wrapper ───────────────── GAP (Sprint 0)
└── Companies Office wrapper ─── GAP (Sprint 0, depends on NZBN entities)
└── Crawl4AI integration ─── GAP (Sprint 1, depends on entities to crawl)
└── Trust scoring ────── GAP (Sprint 1, depends on multi-source data)
└── Scheduling ───── GAP (Sprint 2, depends on working pipeline)
Sprint Dependency Graph
Sprint 0: NZBN + CO + ANZSIC (no internal deps — repos ready)
↓
Sprint 1: Crawl4AI + Trust Scoring (depends on Sprint 0 entities)
↓
Sprint 2: Scheduling (depends on Sprint 1 working pipeline)
↓
Sprint 3: Downstream integration (depends on Sprint 2 data freshness)
Cross-PRD Dependencies
| PRD | Depends On ETL For | Blocked Until |
|---|---|---|
| Sales CRM & RFP | Business profiles + contacts | Sprint 0 complete |
| Sales Dev Agent | Qualified leads by industry | Sprint 1 complete (trust scores) |
| Pipeline Nowcast | Business change signals | Sprint 2 complete (scheduling) |
| Business Idea Generator | Market data for validation | Sprint 1 complete |
Context
- Value Stream Map — Where time dies
- Capability Map — What we can actually do