Data Engineering
What turns raw observations into competitive advantage?
Proprietary data compounds. The business that owns its pipeline owns its intelligence — and that gap widens with every model trained on it.
Dig Deeper
- Data Analysis — Tools and workflows for exploratory analysis and BI dashboards
- Data Pipelines — ETL patterns, pipeline architecture, and automation
- Data Science — ML models, statistical methods, and forecasting
- Multi-Tenant SaaS — Data isolation for multi-tenant systems
- Repository Patterns — Patterns, anti-patterns, and generator checklist from production review
The Data Flow
Six stages from observation to decision:
| Stage | Function | What It Enables |
|---|---|---|
| Acquisition | Collect from sources | Raw material in |
| Warehouse | Store and structure | Single source of truth |
| Visualization | Surface patterns | Human interpretation |
| Science | Model and predict | Machine intelligence |
| Pipelines | Automate movement | Continuous flow |
| Governance | Enforce quality and compliance | Trustworthy output |
Roles
Three functions, one discipline. The business advisor view lives in Business Tech Strategy.
| Role | Technical | Business | Core Craft |
|---|---|---|---|
| Data Engineer | Pipelines, schema, ETL | Flow, volume, cost | Systems thinking |
| Data Analyst | SQL, visualization, dashboards | Decision support, KPIs | Data intuition |
| Data Scientist | ML models, statistics | Forecasting, prediction | Curiosity |
Repository Standards
Ten rules every data repository must pass. The Repository Quality scorecard grades compliance. Full patterns and anti-patterns: Repository Patterns.
| # | Rule | What to Check |
|---|---|---|
| 1 | Method presence | Every repo has findById, findMany, create, update |
| 2 | Section ordering | Type imports, then queries, then mutations, then helpers |
| 3 | Soft-delete policy | deletedAt IS NULL in every read, or hard-delete with audit log |
| 4 | Sort safety | Default ORDER BY on every findMany |
| 5 | Transaction readiness | Methods accept db | tx executor parameter |
| 6 | Prepared statements | Hot-path reads use .prepare() |
| 7 | Error mapping | DB errors map to domain result types, not raw throws |
| 8 | Type safety | Zero any in method signatures |
| 9 | Mapper policy | Row-to-entity mapping in one place per entity |
| 10 | Query builder policy | Composable filters, not string concatenation |
Context
- AI Data Industry — Players, business models, and market structure for data as a product
- Data Footprint — Commissioning instrument measuring data maturity across tables, APIs, and UI
- Hexagonal Architecture — The pattern separating data access from domain logic
- Tech Stack — Database, ORM, auth decisions that feed data engineering
- Drizzle ORM — Prepared statements, tx executors, error mapping
- Business Tech Strategy — Business advisor's view of which data decisions matter
Questions
Which stage of the data flow has the widest gap between what exists and what's connected?
- When repository standards are defined here and measured by scorecard, who closes the loop when violations compound?
- If proprietary data compounds, what's the cost of a pipeline that breaks trust at any one stage?
- Which of the ten repository rules catches the most violations — and does that reveal a training gap or a tooling gap?