Skip to main content

ETL Data Tool — Dependency Map

What must happen first?

External Dependencies

DependencyTypeRiskMitigation
NZBN API subscription keyAPI key (free)Low — registration onlyApply via nzbn.govt.nz
Companies Office API accessAPI key (free)Low — registration onlyApply via companiesoffice.govt.nz
Docker (for Crawl4AI)RuntimeLow — already installedSelf-hosted, Apache 2.0
ANZSIC06 classification tablesReference dataNone — publicly availableDownload from data.govt.nz

Single external gate: NZBN API key. Everything else is open-source or publicly available.

Internal Dependencies

PostgreSQL schemas (73 types) ──── EXIST
agent-etl-cli.ts ──────────────── EXIST
Drizzle repositories ───────────── EXIST
ETL pipeline dashboard ─────────── EXIST (8 pipelines)
File/API connectors ────────────── EXIST (85-90%)

NZBN API wrapper ───────────────── GAP (Sprint 0)
└── Companies Office wrapper ─── GAP (Sprint 0, depends on NZBN entities)
└── Crawl4AI integration ─── GAP (Sprint 1, depends on entities to crawl)
└── Trust scoring ────── GAP (Sprint 1, depends on multi-source data)
└── Scheduling ───── GAP (Sprint 2, depends on working pipeline)

Sprint Dependency Graph

Sprint 0: NZBN + CO + ANZSIC (no internal deps — repos ready)

Sprint 1: Crawl4AI + Trust Scoring (depends on Sprint 0 entities)

Sprint 2: Scheduling (depends on Sprint 1 working pipeline)

Sprint 3: Downstream integration (depends on Sprint 2 data freshness)

Cross-PRD Dependencies

PRDDepends On ETL ForBlocked Until
Sales CRM & RFPBusiness profiles + contactsSprint 0 complete
Sales Dev AgentQualified leads by industrySprint 1 complete (trust scores)
Pipeline NowcastBusiness change signalsSprint 2 complete (scheduling)
Business Idea GeneratorMarket data for validationSprint 1 complete

Context