Skip to main content

AI Data Industry

Who owns the data that trains the intelligence?

Data is AI's raw material. Five layers turn raw observations into predictions: Collection, Connectivity, Storage, Compute, Application. Whoever owns these layers owns the intelligence economy.

Playbook

PromptsQuestionsReflections
PrinciplesWhat guides us?Data compounds, ownership distributes
PerformanceIs it working?Revenue/emission, on-chain traction, market sizing
ProtocolsHow do we do it?DePIN collection, storage, compute workflows
PlatformWhat tools?ABCD stack for data infrastructure
PlayersWho's involved?Scale AI, io.net, Render, GEODNET, Hivemapper

The Thesis

AI data is transitioning from extraction to community ownership:

FromToDriver
Centralized collectionDistributed sensorsDePIN replaces data centers with community devices
Corporate data silosOpen data marketsToken incentives make sharing more profitable than hoarding
Proprietary labelingCommunity annotationScale AI model meets crypto coordination
Cloud compute monopolyDistributed GPU networksio.net, Render break the hyperscaler lock
Extractive surveillanceCompensated contributionData creators earn from value they generate

Opportunity Score

Aggregate: 7.6 / 10 | Classification: Strong Conviction

DimensionScoreKey Evidence
Market Attractiveness8.5AI training data market $25B+ by 2030, Scale AI at $29B valuation
Technology Disruption8.0DePIN data networks growing 300%+ YoY, GPU compute decentralizing
VVFL Alignment7.5Loop works — devices collect, AI trains, predictions generate value, value funds devices
Competitive Position7.0Early infrastructure phase, token models gaining traction
Timing Risk7.02025-2027 build phase, ahead of institutional adoption wave

Verdict: Highest-conviction DePIN vertical. Data demand is structurally insatiable — every AI model needs more. Position across the value chain, not just one layer.

First Principles

PrincipleWhy ImmutableImplication
Data compoundsMore data = better models = more valuable dataFirst to data flywheel wins
Collection is physicalSensors exist in the real worldSomeone must deploy hardware
Quality beats quantityGarbage in, garbage outVerified data commands premium
Ownership creates alignmentData creators should earn from valueToken incentives align contribution and reward
Compute follows dataProcessing moves to where data livesEdge compute > centralized cloud

See Principles for the full framework.

The VVFL in AI Data

DePIN Devices → Raw Data → AI Training → Predictions → Value Capture → Fund More Devices
↑ ↓
└──────────────── Better predictions fund better sensors ───────────────┘

More devices → More data → Better models → Higher value → More devices

Value Chain

LayerFunctionCentralizedDePINMargin Shift
CollectionGather raw dataCorporate sensorsCommunity DePIN devices→ Operators
ConnectivityMove data to processingCloud uploadHelium, WiFi, LoRa→ Network
StoragePersist dataAWS S3, GCPFilecoin, Arweave→ Providers
ComputeTrain and inferNVIDIA + hyperscalersio.net, Render, Akash→ GPU owners
ApplicationDeploy predictionsOpenAI, GoogleOpen models + DePIN→ Community

Transmitters and Actuators

The key insight: DePIN devices split into two categories with different economics.

TypeFunctionExamplesData Role
TransmittersObserve and send dataGEODNET, WeatherXM, HivemapperData supply — feed AI training
ActuatorsAct on AI predictionsRobots, autonomous vehicles, smart locksData demand — consume AI output

The flywheel: Transmitters feed data → AI trains → Actuators execute → Actuators generate new data → Transmitters capture it.

See Robotics Industry for the actuator side.

Friction Map

FrictionLayerStatusOpportunity
Data collection costCollectionGrowingCommunity DePIN sensors
Data quality verificationCollectionGapOn-chain attestations
GPU scarcityComputeGrowingDistributed GPU networks
Storage centralizationStorageGrowingDecentralized persistence
Model accessApplicationGrowingOpen-weight models + DePIN inference

Deep Dives

SectionDeep DiveWhat's There
PrinciplesFirst PrinciplesData flywheel truths
PerformanceMetricsScoring, market sizing, KPIs
ProtocolsWorkflowsDePIN data collection to compute
PlatformABCD StackTechnology layers for data
PlayersEcosystemScale AI, io.net, GEODNET, Render
StarGEODNETRTK positioning network

Context