AI Data Industry
Who owns the data that trains the intelligence?
Data is AI's raw material. Five layers turn raw observations into predictions: Collection, Connectivity, Storage, Compute, Application. Whoever owns these layers owns the intelligence economy.
Playbook
| Prompts | Questions | Reflections |
|---|---|---|
| Principles | What guides us? | Data compounds, ownership distributes |
| Performance | Is it working? | Revenue/emission, on-chain traction, market sizing |
| Protocols | How do we do it? | DePIN collection, storage, compute workflows |
| Platform | What tools? | ABCD stack for data infrastructure |
| Players | Who's involved? | Scale AI, io.net, Render, GEODNET, Hivemapper |
The Thesis
AI data is transitioning from extraction to community ownership:
| From | To | Driver |
|---|---|---|
| Centralized collection | Distributed sensors | DePIN replaces data centers with community devices |
| Corporate data silos | Open data markets | Token incentives make sharing more profitable than hoarding |
| Proprietary labeling | Community annotation | Scale AI model meets crypto coordination |
| Cloud compute monopoly | Distributed GPU networks | io.net, Render break the hyperscaler lock |
| Extractive surveillance | Compensated contribution | Data creators earn from value they generate |
Opportunity Score
Aggregate: 7.6 / 10 | Classification: Strong Conviction
| Dimension | Score | Key Evidence |
|---|---|---|
| Market Attractiveness | 8.5 | AI training data market $25B+ by 2030, Scale AI at $29B valuation |
| Technology Disruption | 8.0 | DePIN data networks growing 300%+ YoY, GPU compute decentralizing |
| VVFL Alignment | 7.5 | Loop works — devices collect, AI trains, predictions generate value, value funds devices |
| Competitive Position | 7.0 | Early infrastructure phase, token models gaining traction |
| Timing Risk | 7.0 | 2025-2027 build phase, ahead of institutional adoption wave |
Verdict: Highest-conviction DePIN vertical. Data demand is structurally insatiable — every AI model needs more. Position across the value chain, not just one layer.
First Principles
| Principle | Why Immutable | Implication |
|---|---|---|
| Data compounds | More data = better models = more valuable data | First to data flywheel wins |
| Collection is physical | Sensors exist in the real world | Someone must deploy hardware |
| Quality beats quantity | Garbage in, garbage out | Verified data commands premium |
| Ownership creates alignment | Data creators should earn from value | Token incentives align contribution and reward |
| Compute follows data | Processing moves to where data lives | Edge compute > centralized cloud |
See Principles for the full framework.
The VVFL in AI Data
DePIN Devices → Raw Data → AI Training → Predictions → Value Capture → Fund More Devices
↑ ↓
└──────────────── Better predictions fund better sensors ───────────────┘
More devices → More data → Better models → Higher value → More devices
Value Chain
| Layer | Function | Centralized | DePIN | Margin Shift |
|---|---|---|---|---|
| Collection | Gather raw data | Corporate sensors | Community DePIN devices | → Operators |
| Connectivity | Move data to processing | Cloud upload | Helium, WiFi, LoRa | → Network |
| Storage | Persist data | AWS S3, GCP | Filecoin, Arweave | → Providers |
| Compute | Train and infer | NVIDIA + hyperscalers | io.net, Render, Akash | → GPU owners |
| Application | Deploy predictions | OpenAI, Google | Open models + DePIN | → Community |
Transmitters and Actuators
The key insight: DePIN devices split into two categories with different economics.
| Type | Function | Examples | Data Role |
|---|---|---|---|
| Transmitters | Observe and send data | GEODNET, WeatherXM, Hivemapper | Data supply — feed AI training |
| Actuators | Act on AI predictions | Robots, autonomous vehicles, smart locks | Data demand — consume AI output |
The flywheel: Transmitters feed data → AI trains → Actuators execute → Actuators generate new data → Transmitters capture it.
See Robotics Industry for the actuator side.
Friction Map
| Friction | Layer | Status | Opportunity |
|---|---|---|---|
| Data collection cost | Collection | Growing | Community DePIN sensors |
| Data quality verification | Collection | Gap | On-chain attestations |
| GPU scarcity | Compute | Growing | Distributed GPU networks |
| Storage centralization | Storage | Growing | Decentralized persistence |
| Model access | Application | Growing | Open-weight models + DePIN inference |
Deep Dives
| Section | Deep Dive | What's There |
|---|---|---|
| Principles | First Principles | Data flywheel truths |
| Performance | Metrics | Scoring, market sizing, KPIs |
| Protocols | Workflows | DePIN data collection to compute |
| Platform | ABCD Stack | Technology layers for data |
| Players | Ecosystem | Scale AI, io.net, GEODNET, Render |
| Star | GEODNET | RTK positioning network |
Context
- Robotics Industry — Actuators that consume and generate data
- Advertising Industry — Monetizes attention on data-driven platforms
- DePIN — Physical infrastructure patterns
- Data Flow — How data creates value
- DePIN Tokens — Token economics for data infrastructure