Data Flow Map

You don't have a data quality problem. You have a data flow problem.

Most AI transformation projects stall before they begin. The stated reason is always the same: "our data isn't clean enough." Teams spend six months cleaning data that didn't need cleaning. The real barrier — business logic trapped in people's heads and data scattered across five platforms with no clear ownership — never gets addressed. The project dies before it finds the constraint.

LLMs were built for messy data. They were not built for data that doesn't move. The question to ask is not "how clean is this data?" but "does it arrive where it is needed, when it is needed, in a form that allows action?" Data quality is downstream of data flow. Fix the flow first.

The Data Flow Map answers the question the constraint map cannot answer alone: before you classify what's real versus artifact, you need to see where your data actually lives and how many hands it passes through to get there.

0. Framing

Anchor the map to the business before listing tools.

Question	Answer
What is the highest-volume workflow in this business? (name one)	[e.g. "client onboarding" not "operations"]
How many platforms does a single unit of that workflow touch, start to finish?	[count — this is your baseline hop count]
Who is the last human to touch data before it becomes a decision?	[name the role — this is where flow stops]
What happens when that person is away?	[substitute, delay, or error — reveals tacit knowledge dependency]

1. Tech Stack Inventory

List every tool the business uses to run its workflows. Do not filter. Include spreadsheets, email, Slack, WhatsApp, shared drives. The ones that feel too informal to list are usually the ones carrying the most critical data.

#	Tool	Category	Who Owns It	How Long In Use	Cost / Month
1		CRM / Comms / Doc / Finance / PM / Analytics / Custom
2
3
4
5
6
7
8

Sprawl signal: If the count exceeds 5 tools for a single workflow, data sprawl is the constraint — not data quality. Each additional tool is a handoff point where data can corrupt, duplicate, or stall.

2. Four-Verb Lifecycle Map

For each core workflow, map every piece of data through its full lifecycle using four verbs. This reveals where data moves cleanly, where it gets copied manually, and where it disappears entirely.

Verb	Question it answers	What good looks like	Warning sign
Create	Where does this data first enter the business?	Single entry point, validated at source	Data entered twice in two systems
Manipulate	What transforms it after entry?	Real-time sync, rule-based routing	Manual reformatting, copy-paste between tools
Share	Who and what consumes it (human, agent, system)?	API access, standard formats (JSON, CSV)	PDF exports, screenshotted for the next person
Delete	When and how is it removed or archived?	Defined retention policy, automated enforcement	"We keep everything" — no deletion policy

Map your primary workflow:

Workflow: [name it]

Step	Verb	Tool	Who Does It	Time	Data State After
1	Create
2	Manipulate
3	Share
4	Manipulate
5	Share
6	Delete / Archive

Total hops: [count]
Human touches: [count]
Artifact touches (no judgment, just moving data): [count]

The artifact touches are the transformation targets. Every artifact touch in your highest-volume workflow is a candidate for elimination.

3. Hop Count Analysis

Hop count = the number of distinct platform touches a single work unit makes from creation to completion. High hop counts reveal data sprawl; the work isn't complex — the movement is.

Workflow	Hop Count	Primary Bottleneck Tool	Artifact Hops	Real Hops

Reading the hop count:

Hops	Signal	Action
1–3	Clean flow — minimal sprawl	Validate logic is documented
4–5	Moderate sprawl — handoff friction building	Identify which hops are artifact
6–8	High sprawl — data ownership unclear	Map the single source of truth
9+	Systemic sprawl — this is the constraint	Fix flow before any AI build

The worked example: client onboarding with 7 hops across Pandadoc, Google Drive, Slack, email, CRM, project tool, and invoicing. Clock time: 4.5 hours. Active work time: 5 minutes. The constraint was hops, not people.

4. Longevity Risk Assessment

The tech stack you map today may not be the stack you build on tomorrow. Before investing in integrations, classify every tool by its displacement risk.

Risk	Displacement Mechanism	Horizon
Stable / Monitor / Replace	AI-native competitor / price collapse / platform shift	under 6 months / 6–18 months / over 18 months
Stable / Monitor / Replace
Stable / Monitor / Replace

Risk definitions:

Rating	Meaning	Action
Stable	Category leader with AI-native roadmap. Safe to build integrations against.	Build
Monitor	Early displacement signals — AI-native competitor gaining traction, or vendor slowing roadmap.	Build lightly, monitor quarterly
Replace	Displacement is probable within 12 months. Building deep integrations here locks in the wrong stack.	Do not build deep integrations. Plan migration.

The question to ask: If I build a workflow automation that depends on this tool, and this tool is disrupted in 12 months, what is the migration cost? If the answer is "high," the risk rating matters more than the current capability.

5. Flow State Assessment

Where does your data sit on the two dimensions that matter most?

Dimension	Slow	Fast
Open (portable, API-accessible, yours to take)	Recycling Pods — data moves eventually, but you own it	Flow State — goal
Locked (proprietary format, vendor-dependent, no export)	Ruck — stuck, switching cost is the real constraint	Walled Garden — fast but trapped

Your current position: [circle one]

Flow State is the target: data moves fast and you own it. Most businesses discover they are in Walled Garden (fast but locked inside one vendor's system) or Ruck (slow and locked). The path out of Ruck is not cleaning data — it is re-routing it.

6. SSOT Audit

For each data type your business runs on, name where the authoritative version lives. If you cannot name it in three seconds, you have multiple sources of truth competing.

Data Type	Current Source(s)	Single Source?	Conflicts Observed	Fix
Client contact details		YES / NO / UNCLEAR
Contract status		YES / NO / UNCLEAR
Revenue actuals		YES / NO / UNCLEAR
Project status		YES / NO / UNCLEAR
Team capacity		YES / NO / UNCLEAR

The SSOT principle: Specs own rules. Trackers store data. Planners link to both — they never redefine. Every downstream tool reads from the single source; no tool maintains its own copy of the same record.

A business with six systems of truth does not have a data quality problem. It has a data ownership problem. Establishing SSOT is the prerequisite step — before any AI build, before any automation, before any integration investment.

7. Before / After

For the top workflow flagged by hop count, sketch the transformation.

	Before (current)	After (AI-native flow)
Hop count
Clock time per unit
Active work time per unit
Artifact touches (no judgment)
Human escalation points
Senior time per unit
What disappears
What remains human

The "what remains human" column is the point of the exercise. This is the work your best people should be doing — the judgment, the relationship, the decision where their name is on the outcome. The rest is flow.

8. Prerequisites Checklist

Before any AI system can be built against a flow-constrained workflow, verify:

Single source of truth identified for all primary data types
Hop count mapped for the primary workflow — bottleneck tool named
Business logic documented — the rules that govern routing decisions are written down
Longevity risk assessed — no integrations planned against Replace-rated tools
Data sovereignty confirmed — all primary data is exportable in standard formats
Retention policy defined — deletion pathways exist, even if not yet automated

Any unchecked box is a prerequisite, not an obstacle. The Business Logic Document and Context Architecture address the logic and knowledge gaps.

Context

Data Flow Principles — The foundational research: SSOT, four-verb lifecycle, flow state matrix, data sovereignty
Constraint Map — Classify what is real versus artifact once the flow is visible
Business Logic Document — Externalize the tacit knowledge sitting in the heads of 2–3 long-tenure employees
Context Architecture — What the AI system needs to load before performing at senior quality
AI-Native Future State — What the redesigned workflow looks like once the flow constraint is removed
AI ROI Model — Translate hop count reduction into P&L impact

Questions

When you say "our data isn't clean enough for AI" — have you checked whether it moves, or only whether it is tidy?

Which workflow in your business has the highest hop count? Have you ever counted the hops?
If your CRM disappeared overnight, which workflows would stop? Which would continue unaffected?
Where does a single piece of customer data live across your tech stack — how many copies exist?
Who is the one person whose absence would create the biggest data bottleneck? What do they know that isn't written down?

0. Framing​

1. Tech Stack Inventory​

2. Four-Verb Lifecycle Map​

3. Hop Count Analysis​

4. Longevity Risk Assessment​

5. Flow State Assessment​

6. SSOT Audit​

7. Before / After​

8. Prerequisites Checklist​

Context​

Links​

Questions​