Skip to main content

Data Flow Map

You don't have a data quality problem. You have a data flow problem.

Most AI transformation projects stall before they begin. The stated reason is always the same: "our data isn't clean enough." Teams spend six months cleaning data that didn't need cleaning. The real barrier — business logic trapped in people's heads and data scattered across five platforms with no clear ownership — never gets addressed. The project dies before it finds the constraint.

LLMs were built for messy data. They were not built for data that doesn't move. The question to ask is not "how clean is this data?" but "does it arrive where it is needed, when it is needed, in a form that allows action?" Data quality is downstream of data flow. Fix the flow first.

The Data Flow Map answers the question the constraint map cannot answer alone: before you classify what's real versus artifact, you need to see where your data actually lives and how many hands it passes through to get there.


0. Framing

Anchor the map to the business before listing tools.

QuestionAnswer
What is the highest-volume workflow in this business? (name one)[e.g. "client onboarding" not "operations"]
How many platforms does a single unit of that workflow touch, start to finish?[count — this is your baseline hop count]
Who is the last human to touch data before it becomes a decision?[name the role — this is where flow stops]
What happens when that person is away?[substitute, delay, or error — reveals tacit knowledge dependency]

1. Tech Stack Inventory

List every tool the business uses to run its workflows. Do not filter. Include spreadsheets, email, Slack, WhatsApp, shared drives. The ones that feel too informal to list are usually the ones carrying the most critical data.

#ToolCategoryWho Owns ItHow Long In UseCost / Month
1CRM / Comms / Doc / Finance / PM / Analytics / Custom
2
3
4
5
6
7
8

Sprawl signal: If the count exceeds 5 tools for a single workflow, data sprawl is the constraint — not data quality. Each additional tool is a handoff point where data can corrupt, duplicate, or stall.


2. Four-Verb Lifecycle Map

For each core workflow, map every piece of data through its full lifecycle using four verbs. This reveals where data moves cleanly, where it gets copied manually, and where it disappears entirely.

VerbQuestion it answersWhat good looks likeWarning sign
CreateWhere does this data first enter the business?Single entry point, validated at sourceData entered twice in two systems
ManipulateWhat transforms it after entry?Real-time sync, rule-based routingManual reformatting, copy-paste between tools
ShareWho and what consumes it (human, agent, system)?API access, standard formats (JSON, CSV)PDF exports, screenshotted for the next person
DeleteWhen and how is it removed or archived?Defined retention policy, automated enforcement"We keep everything" — no deletion policy

Map your primary workflow:

Workflow: [name it]

StepVerbToolWho Does ItTimeData State After
1Create
2Manipulate
3Share
4Manipulate
5Share
6Delete / Archive

Total hops: [count]
Human touches: [count]
Artifact touches (no judgment, just moving data): [count]

The artifact touches are the transformation targets. Every artifact touch in your highest-volume workflow is a candidate for elimination.


3. Hop Count Analysis

Hop count = the number of distinct platform touches a single work unit makes from creation to completion. High hop counts reveal data sprawl; the work isn't complex — the movement is.

WorkflowHop CountPrimary Bottleneck ToolArtifact HopsReal Hops

Reading the hop count:

HopsSignalAction
1–3Clean flow — minimal sprawlValidate logic is documented
4–5Moderate sprawl — handoff friction buildingIdentify which hops are artifact
6–8High sprawl — data ownership unclearMap the single source of truth
9+Systemic sprawl — this is the constraintFix flow before any AI build

The worked example: client onboarding with 7 hops across Pandadoc, Google Drive, Slack, email, CRM, project tool, and invoicing. Clock time: 4.5 hours. Active work time: 5 minutes. The constraint was hops, not people.


4. Longevity Risk Assessment

The tech stack you map today may not be the stack you build on tomorrow. Before investing in integrations, classify every tool by its displacement risk.

ToolCategoryRiskDisplacement MechanismHorizon
Stable / Monitor / ReplaceAI-native competitor / price collapse / platform shiftunder 6 months / 6–18 months / over 18 months
Stable / Monitor / Replace
Stable / Monitor / Replace

Risk definitions:

RatingMeaningAction
StableCategory leader with AI-native roadmap. Safe to build integrations against.Build
MonitorEarly displacement signals — AI-native competitor gaining traction, or vendor slowing roadmap.Build lightly, monitor quarterly
ReplaceDisplacement is probable within 12 months. Building deep integrations here locks in the wrong stack.Do not build deep integrations. Plan migration.

The question to ask: If I build a workflow automation that depends on this tool, and this tool is disrupted in 12 months, what is the migration cost? If the answer is "high," the risk rating matters more than the current capability.


5. Flow State Assessment

Where does your data sit on the two dimensions that matter most?

DimensionSlowFast
Open (portable, API-accessible, yours to take)Recycling Pods — data moves eventually, but you own itFlow State — goal
Locked (proprietary format, vendor-dependent, no export)Ruck — stuck, switching cost is the real constraintWalled Garden — fast but trapped

Your current position: [circle one]

Flow State is the target: data moves fast and you own it. Most businesses discover they are in Walled Garden (fast but locked inside one vendor's system) or Ruck (slow and locked). The path out of Ruck is not cleaning data — it is re-routing it.


6. SSOT Audit

For each data type your business runs on, name where the authoritative version lives. If you cannot name it in three seconds, you have multiple sources of truth competing.

Data TypeCurrent Source(s)Single Source?Conflicts ObservedFix
Client contact detailsYES / NO / UNCLEAR
Contract statusYES / NO / UNCLEAR
Revenue actualsYES / NO / UNCLEAR
Project statusYES / NO / UNCLEAR
Team capacityYES / NO / UNCLEAR

The SSOT principle: Specs own rules. Trackers store data. Planners link to both — they never redefine. Every downstream tool reads from the single source; no tool maintains its own copy of the same record.

A business with six systems of truth does not have a data quality problem. It has a data ownership problem. Establishing SSOT is the prerequisite step — before any AI build, before any automation, before any integration investment.


7. Before / After

For the top workflow flagged by hop count, sketch the transformation.

Before (current)After (AI-native flow)
Hop count
Clock time per unit
Active work time per unit
Artifact touches (no judgment)
Human escalation points
Senior time per unit
What disappears
What remains human

The "what remains human" column is the point of the exercise. This is the work your best people should be doing — the judgment, the relationship, the decision where their name is on the outcome. The rest is flow.


8. Prerequisites Checklist

Before any AI system can be built against a flow-constrained workflow, verify:

  • Single source of truth identified for all primary data types
  • Hop count mapped for the primary workflow — bottleneck tool named
  • Business logic documented — the rules that govern routing decisions are written down
  • Longevity risk assessed — no integrations planned against Replace-rated tools
  • Data sovereignty confirmed — all primary data is exportable in standard formats
  • Retention policy defined — deletion pathways exist, even if not yet automated

Any unchecked box is a prerequisite, not an obstacle. The Business Logic Document and Context Architecture address the logic and knowledge gaps.


Context

  • Data Flow Principles — The foundational research: SSOT, four-verb lifecycle, flow state matrix, data sovereignty
  • Constraint Map — Classify what is real versus artifact once the flow is visible
  • Business Logic Document — Externalize the tacit knowledge sitting in the heads of 2–3 long-tenure employees
  • Context Architecture — What the AI system needs to load before performing at senior quality
  • AI-Native Future State — What the redesigned workflow looks like once the flow constraint is removed
  • AI ROI Model — Translate hop count reduction into P&L impact

Questions

When you say "our data isn't clean enough for AI" — have you checked whether it moves, or only whether it is tidy?

  • Which workflow in your business has the highest hop count? Have you ever counted the hops?
  • If your CRM disappeared overnight, which workflows would stop? Which would continue unaffected?
  • Where does a single piece of customer data live across your tech stack — how many copies exist?
  • Who is the one person whose absence would create the biggest data bottleneck? What do they know that isn't written down?