ETL Data Tool — Agent & Instrument Diagram
How do agents orchestrate data acquisition?
System Diagram
┌─────────────────────────────────────────────────────────┐
│ SCHEDULE CONTROLLER │
│ (Cron: monthly bulk, weekly delta) │
└──────────────────────┬──────────────────────────────────┘
│ triggers
▼
┌─────────────────────────────────────────────── ──────────┐
│ ETL AGENT │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │
│ │ NZBN │ │ CO │ │ Crawl4AI │ │
│ │ Extractor│──▶│ Enricher │──▶│ Enricher │ │
│ │ (API) │ │ (API) │ │ (Docker) │ │
│ └──────────┘ └──────────┘ └──────────────┘ │
│ │ │ │ │
│ └──────────────┼───────────────┘ │
│ ▼ │
│ ┌───────────────┐ │
│ │ TRUST │ │
│ │ INSTRUMENT │◄── Scores every record │
│ │ (0-100) │ before load │
│ └───────┬───────┘ │
│ │ │
│ ▼ │
│ ┌───────────────┐ │
│ │ TYPE-SAFE │ │
│ │ LOADER │◄── Drizzle + rollback │
│ │ (PostgreSQL) │ │
│ └───────────────┘ │
└───────────────────────────── ────────────────────────────┘
│
▼ FEEDBACK LOOP
┌─────────────────────────────────────────────────────────┐
│ DOWNSTREAM CONSUMERS │
│ │
│ Sales CRM ◄── business profiles + contacts │
│ Sales Dev ◄── qualified leads (trust > 70) │
│ Nowcast ◄── business change signals (weekly delta) │
│ BIG ◄── market data for idea validation │
│ │
│ Query rate per entity = kill signal instrument │
└─────────────────────────────────────────────────────────┘
P&ID Translation
| P&ID Element | A&ID Equivalent | Instance |
|---|---|---|
| Process | ETL Agent | Extracts, transforms, loads |
| Instrument | Trust Scoring Engine | Measures data quality (0-100) |
| Pipeline | Three-layer architecture | APIs → Enrichment → Scoring |
| Control Loop | Schedule Controller + Kill Signal | Cron triggers + query rate feedback |
| Valve | Trust threshold (score ≥ 40) | Blocks low-trust data from downstream |
Feedback Loops
| Loop | Signal | Response |
|---|---|---|
| Trust threshold | Score < 40 on incoming record | Flag for re-verification, do not serve to consumers |
| Query rate | Zero queries after 14 days | Kill signal — stop extraction, investigate demand |
| Freshness decay | Last refresh > 7 days | Schedule controller triggers delta run |
| Cross-reference | New source confirms existing record | Trust score increases (cross_reference dimension) |
Context
- Capability Map — What we can actually do
- ETL Spec — Agent-facing commands and boundaries