Skip to main content

ETL Data Tool

What's the point of building schemas if nothing flows through them?

Scorecard

Dimension	Score	Evidence
Pain	5/5	CRM = 0 businesses. Nowcast = 0 signals. Every data consumer blocked.
Demand	4/5	Internal dependency for 5+ PRDs. No external demand yet.
Edge	4/5	73 domain types + free NZ govt APIs + existing Drizzle repos. 6+ months to replicate.
Trend	5/5	73% AI projects fail on data (Gartner). MCP scraping tools exploding. Crawl4AI 58K stars.
Conversion	3/5	Clear internal path. External pricing untested.
Composite	1200	5 x 4 x 4 x 5 x 3

Kill signal: Data loads but nobody queries it within 14 days = extraction theater.

Issues

#	Severity	What Happens	Fix
18	MEDIUM	`/settings/etl` returns 404. Settings sidebar "ETL Pipelines" links to missing page.	Create the route or fix sidebar link to correct path.

Context

Sales CRM & RFP — Primary consumer: needs business profiles and contacts
Sales Dev Agent — Consumer: needs qualified leads from classified businesses
Pipeline Nowcast — Consumer: needs business signals for variance prediction
Data Interface — Child: access layer that sits on ETL output
AI Data Industry — Market context and competitive landscape

Questions

What happens to every downstream PRD if this pipeline stays empty for another month?

If NZBN gives us 700K businesses for free, what's our excuse for having zero in the CRM?
At what trust score threshold does scraped data become more dangerous than no data?
When the first venture queries ETL-loaded data without writing extraction code, does that prove the Mycelium thesis?

Scorecard
Issues
Context
Questions