Skip to main content

AI Data Players

Who operates at each layer of the data value chain.

Ecosystem Map

LayerCentralizedDePIN / DecentralizedTrend
CollectionScale AI, AppenGEODNET, Hivemapper, WeatherXM, GrassDePIN growing fastest
ConnectivityAWS, CloudflareHelium, WiFi protocolsHelium mobile scaling
StorageAWS S3, GCP, AzureFilecoin, Arweave, IPFSCost parity for cold
ComputeNVIDIA, AWS, GCPio.net, Render, Akash, GensynGPU demand outstrips supply
ApplicationOpenAI, Google, AnthropicOpen-weight models, DePIN inferenceBoth growing

Collection Layer

Scale AI (Centralized Benchmark)

The reference point for data infrastructure valuation.

MetricValue
Valuation$29B (2025, post-Meta investment)
Revenue~$870M (2024), targeting $1.5B ARR
FounderAlexander Wang (youngest self-made billionaire)
Core productHuman-verified AI training data
CustomersOpenAI, Meta, US Department of Defense

Why it matters: Scale AI proved that curated, labeled data is worth billions. DePIN protocols are building the same function with distributed infrastructure and token incentives.

GEODNET

Global RTK positioning network with centimeter precision.

MetricValue
Stations20,000+ across 148 countries
PrecisionCentimeter-level RTK corrections
Use casesAutonomous vehicles, precision agriculture, surveying
TokenGEOD

Thesis: Positioning data is foundational for every mobile AI system — self-driving cars, drones, robots. GEODNET builds the base layer.

See GEODNET deep dive for full analysis.

Hivemapper

Decentralized street-level mapping.

MetricValue
Coverage30%+ of global road network mapped
ContributorsDashcam operators worldwide
OutputFresh map data for AI, logistics, insurance
TokenHONEY

Thesis: Google and Apple control mapping data. Hivemapper creates a community-owned alternative that updates in real-time.

WeatherXM

Decentralized weather data network.

MetricValue
Stations7,000+ weather stations
DataHyperlocal temperature, humidity, pressure, wind
BuyersAgriculture, insurance, energy companies
TokenWXM

Grass

Decentralized web data collection for AI training.

MetricValue
Network3M+ daily active users, 8.5M monthly nodes
FunctionWeb scraping and data collection at scale
OutputTraining data for LLMs and AI models
TokenGRASS

Connectivity Layer

Helium

The original DePIN protocol. Connectivity as community infrastructure.

MetricValue
Hotspots115K+ active (2M daily mobile users)
Revenue$18.3M ARR (Sep 2025 record)
NetworkIoT (LoRaWAN) + 5G mobile
TokenHNT, MOBILE, IOT

See Telecom Players for detailed Helium analysis.


Storage Layer

Filecoin

Incentivized decentralized storage built on IPFS.

MetricValue
Storage capacity3.3 EiB committed (32% utilized)
StatusOnchain Cloud launched Jan 2026
Best forLarge dataset archival, data sovereignty
TokenFIL

Arweave

Permanent, immutable storage.

MetricValue
ModelPay once, store forever
Best forData provenance records, attestations
TokenAR

Compute Layer

io.net

Distributed GPU compute marketplace.

MetricValue
GPUs327K verified (10K+ active nodes)
Revenue$20M ARR (Oct 2025)
Best forML training, batch inference
TokenIO

Render Network

Decentralized GPU rendering and AI compute.

MetricValue
Nodes5,600 (22M frames rendered in 2025)
ModelBurn-mint equilibrium
Best forGraphics rendering, AI inference
TokenRENDER
ChainSolana

Akash Network

Decentralized cloud compute with reverse auction pricing.

MetricValue
ModelReverse auction for compute
Cost50-85% cheaper than AWS
Best forGeneral cloud workloads, containers
TokenAKT

Application Layer

Open-Weight Models

ModelCreatorSignificance
LlamaMetaLargest open-weight model family
MistralMistral AIEuropean open-weight leader
DeepSeekDeepSeekCost-efficient training breakthrough
GemmaGoogleGoogle's open offering

Data Marketplaces

ProtocolFunctionStatus
Ocean ProtocolData tokenization and tradingEstablished
StreamrReal-time data streaming marketplaceGrowing
NumeraiCrowdsourced predictions marketEstablished

Competitive Dynamics

Centralized vs DePIN

DimensionCentralized WinsDePIN Wins
SpeedFaster iterationFaster deployment at scale
Quality controlEasier to enforceCryptographic verification
Cost structureFixed (data centers)Variable (community devices)
Geographic reachLimited by capexUnlimited by incentive
Data sovereigntyCorporate controlledUser controlled

The Convergence

The winning model likely combines both:

  • Centralized quality (Scale AI-style verification) + Decentralized collection (DePIN sensor networks)
  • Hyperscaler compute (for frontier training) + Distributed GPU (for inference at scale)
  • Proprietary models (for cutting edge) + Open-weight models (for distribution)

Investment Thesis

LayerConvictionPositionRisk
CollectionHighGEODNET, HivemapperDevice saturation
ConnectivityMediumHelium via telecom thesisToken sustainability
StorageMediumFilecoin for enterpriseCloud price wars
ComputeHighio.net, RenderHyperscaler response
ApplicationWatchOpen-weight ecosystemWinner unclear

Context