Data Flow Map
You don't have a data quality problem. You have a data flow problem.
Most AI transformation projects stall before they begin. The stated reason is always the same: "our data isn't clean enough." Teams spend six months cleaning data that didn't need cleaning. The real barrier — business logic trapped in people's heads and data scattered across five platforms with no clear ownership — never gets addressed. The project dies before it finds the constraint.
LLMs were built for messy data. They were not built for data that doesn't move. The question to ask is not "how clean is this data?" but "does it arrive where it is needed, when it is needed, in a form that allows action?" Data quality is downstream of data flow. Fix the flow first.
The Data Flow Map answers the question the constraint map cannot answer alone: before you classify what's real versus artifact, you need to see where your data actually lives and how many hands it passes through to get there.
0. Framing
Anchor the map to the business before listing tools.
| Question | Answer |
|---|---|
| What is the highest-volume workflow in this business? (name one) | [e.g. "client onboarding" not "operations"] |
| How many platforms does a single unit of that workflow touch, start to finish? | [count — this is your baseline hop count] |
| Who is the last human to touch data before it becomes a decision? | [name the role — this is where flow stops] |
| What happens when that person is away? | [substitute, delay, or error — reveals tacit knowledge dependency] |
1. Tech Stack Inventory
List every tool the business uses to run its workflows. Do not filter. Include spreadsheets, email, Slack, WhatsApp, shared drives. The ones that feel too informal to list are usually the ones carrying the most critical data.
| # | Tool | Category | Who Owns It | How Long In Use | Cost / Month |
|---|---|---|---|---|---|
| 1 | CRM / Comms / Doc / Finance / PM / Analytics / Custom | ||||
| 2 | |||||
| 3 | |||||
| 4 | |||||
| 5 | |||||
| 6 | |||||
| 7 | |||||
| 8 |
Sprawl signal: If the count exceeds 5 tools for a single workflow, data sprawl is the constraint — not data quality. Each additional tool is a handoff point where data can corrupt, duplicate, or stall.
2. Four-Verb Lifecycle Map
For each core workflow, map every piece of data through its full lifecycle using four verbs. This reveals where data moves cleanly, where it gets copied manually, and where it disappears entirely.
| Verb | Question it answers | What good looks like | Warning sign |
|---|---|---|---|
| Create | Where does this data first enter the business? | Single entry point, validated at source | Data entered twice in two systems |
| Manipulate | What transforms it after entry? | Real-time sync, rule-based routing | Manual reformatting, copy-paste between tools |
| Share | Who and what consumes it (human, agent, system)? | API access, standard formats (JSON, CSV) | PDF exports, screenshotted for the next person |
| Delete | When and how is it removed or archived? | Defined retention policy, automated enforcement | "We keep everything" — no deletion policy |
Map your primary workflow:
Workflow: [name it]
| Step | Verb | Tool | Who Does It | Time | Data State After |
|---|---|---|---|---|---|
| 1 | Create | ||||
| 2 | Manipulate | ||||
| 3 | Share | ||||
| 4 | Manipulate | ||||
| 5 | Share | ||||
| 6 | Delete / Archive |
Total hops: [count]
Human touches: [count]
Artifact touches (no judgment, just moving data): [count]
The artifact touches are the transformation targets. Every artifact touch in your highest-volume workflow is a candidate for elimination.
3. Hop Count Analysis
Hop count = the number of distinct platform touches a single work unit makes from creation to completion. High hop counts reveal data sprawl; the work isn't complex — the movement is.
| Workflow | Hop Count | Primary Bottleneck Tool | Artifact Hops | Real Hops |
|---|---|---|---|---|
Reading the hop count:
| Hops | Signal | Action |
|---|---|---|
| 1–3 | Clean flow — minimal sprawl | Validate logic is documented |
| 4–5 | Moderate sprawl — handoff friction building | Identify which hops are artifact |
| 6–8 | High sprawl — data ownership unclear | Map the single source of truth |
| 9+ | Systemic sprawl — this is the constraint | Fix flow before any AI build |
The worked example: client onboarding with 7 hops across Pandadoc, Google Drive, Slack, email, CRM, project tool, and invoicing. Clock time: 4.5 hours. Active work time: 5 minutes. The constraint was hops, not people.
4. Longevity Risk Assessment
The tech stack you map today may not be the stack you build on tomorrow. Before investing in integrations, classify every tool by its displacement risk.
| Tool | Category | Risk | Displacement Mechanism | Horizon |
|---|---|---|---|---|
| Stable / Monitor / Replace | AI-native competitor / price collapse / platform shift | under 6 months / 6–18 months / over 18 months | ||
| Stable / Monitor / Replace | ||||
| Stable / Monitor / Replace |
Risk definitions:
| Rating | Meaning | Action |
|---|---|---|
| Stable | Category leader with AI-native roadmap. Safe to build integrations against. | Build |
| Monitor | Early displacement signals — AI-native competitor gaining traction, or vendor slowing roadmap. | Build lightly, monitor quarterly |
| Replace | Displacement is probable within 12 months. Building deep integrations here locks in the wrong stack. | Do not build deep integrations. Plan migration. |
The question to ask: If I build a workflow automation that depends on this tool, and this tool is disrupted in 12 months, what is the migration cost? If the answer is "high," the risk rating matters more than the current capability.
5. Flow State Assessment
Where does your data sit on the two dimensions that matter most?
| Dimension | Slow | Fast |
|---|---|---|
| Open (portable, API-accessible, yours to take) | Recycling Pods — data moves eventually, but you own it | Flow State — goal |
| Locked (proprietary format, vendor-dependent, no export) | Ruck — stuck, switching cost is the real constraint | Walled Garden — fast but trapped |
Your current position: [circle one]
Flow State is the target: data moves fast and you own it. Most businesses discover they are in Walled Garden (fast but locked inside one vendor's system) or Ruck (slow and locked). The path out of Ruck is not cleaning data — it is re-routing it.
6. SSOT Audit
For each data type your business runs on, name where the authoritative version lives. If you cannot name it in three seconds, you have multiple sources of truth competing.
| Data Type | Current Source(s) | Single Source? | Conflicts Observed | Fix |
|---|---|---|---|---|
| Client contact details | YES / NO / UNCLEAR | |||
| Contract status | YES / NO / UNCLEAR | |||
| Revenue actuals | YES / NO / UNCLEAR | |||
| Project status | YES / NO / UNCLEAR | |||
| Team capacity | YES / NO / UNCLEAR |
The SSOT principle: Specs own rules. Trackers store data. Planners link to both — they never redefine. Every downstream tool reads from the single source; no tool maintains its own copy of the same record.
A business with six systems of truth does not have a data quality problem. It has a data ownership problem. Establishing SSOT is the prerequisite step — before any AI build, before any automation, before any integration investment.
7. Before / After
For the top workflow flagged by hop count, sketch the transformation.
| Before (current) | After (AI-native flow) | |
|---|---|---|
| Hop count | ||
| Clock time per unit | ||
| Active work time per unit | ||
| Artifact touches (no judgment) | ||
| Human escalation points | ||
| Senior time per unit | ||
| What disappears | ||
| What remains human |
The "what remains human" column is the point of the exercise. This is the work your best people should be doing — the judgment, the relationship, the decision where their name is on the outcome. The rest is flow.
8. Prerequisites Checklist
Before any AI system can be built against a flow-constrained workflow, verify:
- Single source of truth identified for all primary data types
- Hop count mapped for the primary workflow — bottleneck tool named
- Business logic documented — the rules that govern routing decisions are written down
- Longevity risk assessed — no integrations planned against Replace-rated tools
- Data sovereignty confirmed — all primary data is exportable in standard formats
- Retention policy defined — deletion pathways exist, even if not yet automated
Any unchecked box is a prerequisite, not an obstacle. The Business Logic Document and Context Architecture address the logic and knowledge gaps.
Context
- Data Flow Principles — The foundational research: SSOT, four-verb lifecycle, flow state matrix, data sovereignty
- Constraint Map — Classify what is real versus artifact once the flow is visible
- Business Logic Document — Externalize the tacit knowledge sitting in the heads of 2–3 long-tenure employees
- Context Architecture — What the AI system needs to load before performing at senior quality
- AI-Native Future State — What the redesigned workflow looks like once the flow constraint is removed
- AI ROI Model — Translate hop count reduction into P&L impact
Links
- Goldratt — Theory of Constraints — Find the binding constraint, exploit it, elevate it
- DAMA — Data Management Body of Knowledge — Industry standard for data governance and flow
- Martin Fowler — Event Sourcing — Architectural pattern for reliable data flow
Questions
When you say "our data isn't clean enough for AI" — have you checked whether it moves, or only whether it is tidy?
- Which workflow in your business has the highest hop count? Have you ever counted the hops?
- If your CRM disappeared overnight, which workflows would stop? Which would continue unaffected?
- Where does a single piece of customer data live across your tech stack — how many copies exist?
- Who is the one person whose absence would create the biggest data bottleneck? What do they know that isn't written down?