Skip to main content

Data Flow

Data is like a rugby ball - you want it clean, fast, and open. Not stuck in the bottom of a ruck, or stuck in a loop of pointless pods while trying to decide what to do with it.

The Rugby Ball Matrix

What makes data move like a good team moves the ball?

CleanFastOpen
What it meansAccurate, consistent, validatedLow latency, real-time syncExportable, portable, API access
Good signSingle source of truthWebhook-first architectureStandard formats (JSON, CSV)
Bad signCopy-paste between systemsBatch jobs, overnight syncProprietary formats, no export
Rugby analogyBall presented cleanly at breakdownQuick ball to backsOffloads keep play alive
Anti-patternBall contested, turnover riskSlow ruck, defense resetsBall trapped, no options

Data Flow States

           Locked          Open
┌────────────┬────────────┐
Fast │ WALLED │ FLOW │ ← Where you want to be
│ GARDEN │ STATE │
├────────────┼────────────┤
Slow │ RUCK │ RECYCLING │ ← Where most SaaS traps you
│ (stuck) │ PODS │
└────────────┴────────────┘

Flow State: Data moves freely, real-time, you control it. Like backs running structured plays with quick ball.

Walled Garden: Fast but locked. Platform owns your data. You're renting your own information.

Recycling Pods: Open but slow. Export available but painful. CSV dumps, manual imports, batch processes.

Ruck (stuck): Worst case. Slow AND locked. Data hostage. Switching costs astronomical.

Evaluating Any SaaS

Before buying any tool, run the Rugby Ball Test:

QuestionCleanFastOpen
Can I trust this data without manual verification?
Does it sync in real-time or near-real-time?
Can I export ALL my data in standard formats?
Can I programmatically access via API?
Can I delete my data completely when I leave?
Is there a single source of truth?
Do changes propagate immediately?

Scoring: If you can't check all boxes, you're accepting lock-in risk. Know what you're trading.

Why DePIN Changes Everything

Traditional SaaS traps data in corporate rucks. You generate data → they store it → you pay to access your own information → switching costs keep you locked in.

DePIN inverts this:

DimensionTraditional SaaSDePIN
CleanCorporate databases, vendor-controlled qualityCryptographically verified at source, immutable
FastAPI rate limits, batch sync, vendor bottlenecksEdge-native, real-time, peer-to-peer
OpenProprietary formats, export friction, lock-inOpen protocols, portable by default, you own it

The ABCD Stack as Data Infrastructure

           Data Quality                    Data Speed
┌─────────────────────────────────────────────┐
A │ AI validates patterns │
I │ Learns from action→consequence loops │
├─────────────────────────────────────────────┤
B │ Blockchain = immutable audit trail │
L │ Can't be edited, deleted, or disputed │
O ├─────────────────────────────────────────────┤
C │ Crypto = aligned incentives │
K │ Contributors rewarded, bad actors punished │
├─────────────────────────────────────────────┤
D │ DePIN = data born at the edge │
E │ Sensors, devices, real-world ground truth │
P └─────────────────────────────────────────────┘
I ↓
N CLEAN, FAST, OPEN
by architecture, not policy

Intercognitive Standards

When machines talk to machines, they need shared protocols - not corporate APIs that change on a vendor's whim.

LayerStandardRugby Analogy
IdentityDIDs, verifiable credentialsJersey numbers - know who's on the field
MessagingMCP, Agent protocolsCalls and lineout codes
ValueCrypto rails, smart contractsThe scoreboard everyone trusts
TruthBlockchain attestationsVideo ref - can't dispute the replay

The shift: From "trust the platform" to "verify the protocol."

Impact on Product Development

When data is clean-fast-open by default:

  1. No more integration hell - Standard protocols replace custom API integrations
  2. Real-time everything - No batch jobs, no overnight sync, no stale dashboards
  3. Portable customers - Users bring their data, not locked to your silo
  4. Composable products - Build on verified data streams, not walled gardens
  5. AI that learns from reality - Ground truth from DePIN sensors, not scraped web noise

The companies that win in 2027 aren't the ones with the most data locked up. They're the ones building on rails where data flows freely - and they capture value through quality, not lock-in.

Data is worth more than gold! Are you valuing your data as much as you should be?

The most valuable asset is high signal proprietary data. Interconnected technology will close feedback loops of data propagation and interpretation and actions and consequences for AI to learn from.

The most valuable commodity I know of is information. - Gordon Gekko

Data Types

Reference Data, Transaction Data, and System Data.

Data Gravity

As AI continues to evolve and generate more data, managing data gravity becomes crucial for organizations. It requires careful planning of data storage, processing locations, and infrastructure to balance performance, cost, and compliance needs. The intersection of data gravity and AI is reshaping IT strategies and driving the need for more sophisticated, data-centric approaches to infrastructure and data management.

Data gravity is becoming an increasingly important concept in the era of AI, particularly with the rise of generative AI. Here are the key points about how data gravity and AI intersect:

Flow of Value

Accurate high signal data is critical to viability of AI economics. Proprietary data is that owned and controlled by a company or organization and is not publicly available. This data can include customer information, financial data, product data, and other sensitive information that is critical to the success of a business.

Validated Proof: Don't Trust, Verify.

  1. Competitive advantage: As AI models become more commoditized, proprietary data emerges as a key differentiator. Companies with access to unique, high-quality datasets will have a significant edge in developing more capable and specialized AI systems.
  2. Overcoming data scarcity: Public datasets and internet-scraped data are becoming exhausted as training resources. Proprietary data, especially on complex reasoning and tool use, represents a new frontier that can push AI capabilities forward.
  3. Enhancing reasoning capabilities: Current AI models struggle with sophisticated reasoning tasks. Proprietary data on validated reasoning processes can help train models to perform more advanced logical and analytical operations, bringing them closer to human-level cognition.
  4. Improving tool use: Data on how humans effectively use tools to solve problems can enable AI systems to better leverage external resources and APIs, greatly expanding their problem-solving capabilities.
  5. Regulatory compliance: As AI regulations tighten globally, having well-documented, ethically-sourced proprietary data on reasoning and tool use can help companies demonstrate responsible AI development practices.
  6. Tailored solutions: Proprietary data allows for the development of AI models that are specifically tuned to solve domain-specific problems, rather than relying on general-purpose models.
  7. Data quality control: Unlike public datasets, proprietary data can be carefully curated and validated, ensuring higher quality inputs for AI training.
  8. Protecting intellectual property: By using proprietary data, companies can develop unique AI capabilities without relying on potentially copyright-infringing public data sources.
  9. Ethical considerations: Proprietary data on reasoning and tool use can be collected with proper consent and privacy safeguards, addressing ethical concerns surrounding AI training data.
  10. Bridging the gap to AGI: Advanced reasoning and tool use are considered crucial steps towards artificial general intelligence (AGI). Proprietary data in these areas could accelerate progress towards more generalized AI systems.

The process of adding refined data into AI is one of the highest leverage jobs that humans can have - Alex Wang

Impact

  1. Increased Data Creation: AI, especially generative AI, is creating more data to work with, compounding the challenges related to data gravity. As AI models are trained and used, they generate vast amounts of new data.
  2. Data Placement Considerations: The location of data becomes crucial when working with AI, whether it's for training models or using them. This affects decisions about cloud, on-premises, or edge computing strategies.
  3. Infrastructure Demands: AI is shaping IT infrastructure needs, including the placement of data centers, on-premises and hybrid cloud services, and other locations for data storage, training, and processing.

Challenges

  1. Processing Location: Enterprises need to determine where to both compute and store data for AI workloads. The 2023 Generative AI Pulse Survey shows that 82% of IT leaders prefer an on-premises or hybrid approach for data management.
  2. Edge AI: The growth of Industrial Internet of Things (IIoT) and edge AI involves data processing at the edge, requiring decisions about how much data to process locally versus routing to the cloud.
  3. Cost Considerations: Managing data for AI in various environments (cloud, on-premises, hybrid) significantly impacts costs and data management strategies.
  4. Data Governance: AI applications require careful consideration of how much data needs to be moved or retained to be useful, affecting data governance strategies.

Strategies

  1. Ecosystem Approach: Solving data gravity issues in the context of AI requires an ecosystem approach, considering factors like GDP, technology maturity, and local regulations.
  2. Cloud Solutions: Cloud providers can be effective for hosting large AI datasets, as they can scale more easily and manage throughput and workload balance.
  3. Data Filtering and Analysis: For edge AI applications, filtering or analysing data in situ or in transit can help manage data gravity issues without centralizing all data.
  4. Hybrid IT Strategies: Implementing hybrid IT approaches can help balance the needs of AI workloads with data gravity considerations.
  5. Data-Centric Architecture: An inverted data-centric architecture deployed at points of presence in neutral, multi-tenant data-centers is suggested as a solution for modern data gravity challenges.
  1. Rapid Growth: The Data Gravity Index predicts a 139% growth in data gravity intensity between 2020 and 2024, largely driven by AI and digital transformation.
  2. Real-Time Intelligence: There's an increasing need for real-time intelligence to power innovation, which is challenging with legacy architectures due to data gravity issues.
  3. Cybersecurity Concerns: The acceleration of digital transformation and AI adoption is amplifying cybersecurity challenges related to data gravity.

Asset Tokenization

Tokenization: The convergence of AI and blockchain technology is creating new opportunities for data management and governance. Data DAOs leverage blockchain to decentralize data control and enhance security, while AI can optimize data utilization and decision-making processes.

  • Decentralized Data Ownership: Data DAOs offer a model where data ownership is distributed among participants rather than being controlled by a single entity. This decentralization can lead to more equitable data sharing and usage.
  • Incentivization Mechanisms: By using tokens, Data DAOs can incentivize participants to contribute data and validate transactions. This token-based economy encourages active participation and ensures that contributors are fairly rewarded.
  • Transparency and Trust: Blockchain's inherent transparency ensures that all data transactions and governance activities are recorded and verifiable. This builds trust among participants and reduces the risk of data manipulation.
  • Automated Governance: Smart contracts automate many of the governance processes within a Data DAO, reducing the need for intermediaries and ensuring that rules are enforced consistently and fairly.
  • Scalability and Efficiency: The article discusses how Data DAOs can scale efficiently by leveraging decentralized networks, making them suitable for managing large volumes of data across diverse participants.
  • Current Trends and Adoption: The piece also touches on the current trends driving the adoption of Data DAOs, including the increasing value of data, the need for better data governance, and the growing interest in decentralized technologies.
  • Challenges and Considerations: While Data DAOs offer many benefits, the article also acknowledges challenges such as regulatory uncertainties, the complexity of smart contract development, and the need for robust security measures.

Context