AI Data Players
Who operates at each layer of the data value chain.
Ecosystem Map
| Layer | Centralized | DePIN / Decentralized | Trend |
|---|---|---|---|
| Collection | Scale AI, Appen | GEODNET, Hivemapper, WeatherXM, Grass | DePIN growing fastest |
| Connectivity | AWS, Cloudflare | Helium, WiFi protocols | Helium mobile scaling |
| Storage | AWS S3, GCP, Azure | Filecoin, Arweave, IPFS | Cost parity for cold |
| Compute | NVIDIA, AWS, GCP | io.net, Render, Akash, Gensyn | GPU demand outstrips supply |
| Application | OpenAI, Google, Anthropic | Open-weight models, DePIN inference | Both growing |
Collection Layer
Scale AI (Centralized Benchmark)
The reference point for data infrastructure valuation.
| Metric | Value |
|---|---|
| Valuation | $29B (2025, post-Meta investment) |
| Revenue | ~$870M (2024), targeting $1.5B ARR |
| Founder | Alexander Wang (youngest self-made billionaire) |
| Core product | Human-verified AI training data |
| Customers | OpenAI, Meta, US Department of Defense |
Why it matters: Scale AI proved that curated, labeled data is worth billions. DePIN protocols are building the same function with distributed infrastructure and token incentives.
GEODNET
Global RTK positioning network with centimeter precision.
| Metric | Value |
|---|---|
| Stations | 20,000+ across 148 countries |
| Precision | Centimeter-level RTK corrections |
| Use cases | Autonomous vehicles, precision agriculture, surveying |
| Token | GEOD |
Thesis: Positioning data is foundational for every mobile AI system — self-driving cars, drones, robots. GEODNET builds the base layer.
See GEODNET deep dive for full analysis.
Hivemapper
Decentralized street-level mapping.
| Metric | Value |
|---|---|
| Coverage | 30%+ of global road network mapped |
| Contributors | Dashcam operators worldwide |
| Output | Fresh map data for AI, logistics, insurance |
| Token | HONEY |
Thesis: Google and Apple control mapping data. Hivemapper creates a community-owned alternative that updates in real-time.
WeatherXM
Decentralized weather data network.
| Metric | Value |
|---|---|
| Stations | 7,000+ weather stations |
| Data | Hyperlocal temperature, humidity, pressure, wind |
| Buyers | Agriculture, insurance, energy companies |
| Token | WXM |
Grass
Decentralized web data collection for AI training.
| Metric | Value |
|---|---|
| Network | 3M+ daily active users, 8.5M monthly nodes |
| Function | Web scraping and data collection at scale |
| Output | Training data for LLMs and AI models |
| Token | GRASS |
Connectivity Layer
Helium
The original DePIN protocol. Connectivity as community infrastructure.
| Metric | Value |
|---|---|
| Hotspots | 115K+ active (2M daily mobile users) |
| Revenue | $18.3M ARR (Sep 2025 record) |
| Network | IoT (LoRaWAN) + 5G mobile |
| Token | HNT, MOBILE, IOT |
See Telecom Players for detailed Helium analysis.
Storage Layer
Filecoin
Incentivized decentralized storage built on IPFS.
| Metric | Value |
|---|---|
| Storage capacity | 3.3 EiB committed (32% utilized) |
| Status | Onchain Cloud launched Jan 2026 |
| Best for | Large dataset archival, data sovereignty |
| Token | FIL |
Arweave
Permanent, immutable storage.
| Metric | Value |
|---|---|
| Model | Pay once, store forever |
| Best for | Data provenance records, attestations |
| Token | AR |
Compute Layer
io.net
Distributed GPU compute marketplace.
| Metric | Value |
|---|---|
| GPUs | 327K verified (10K+ active nodes) |
| Revenue | $20M ARR (Oct 2025) |
| Best for | ML training, batch inference |
| Token | IO |
Render Network
Decentralized GPU rendering and AI compute.
| Metric | Value |
|---|---|
| Nodes | 5,600 (22M frames rendered in 2025) |
| Model | Burn-mint equilibrium |
| Best for | Graphics rendering, AI inference |
| Token | RENDER |
| Chain | Solana |
Akash Network
Decentralized cloud compute with reverse auction pricing.
| Metric | Value |
|---|---|
| Model | Reverse auction for compute |
| Cost | 50-85% cheaper than AWS |
| Best for | General cloud workloads, containers |
| Token | AKT |
Application Layer
Open-Weight Models
| Model | Creator | Significance |
|---|---|---|
| Llama | Meta | Largest open-weight model family |
| Mistral | Mistral AI | European open-weight leader |
| DeepSeek | DeepSeek | Cost-efficient training breakthrough |
| Gemma | Google's open offering |
Data Marketplaces
| Protocol | Function | Status |
|---|---|---|
| Ocean Protocol | Data tokenization and trading | Established |
| Streamr | Real-time data streaming marketplace | Growing |
| Numerai | Crowdsourced predictions market | Established |
Competitive Dynamics
Centralized vs DePIN
| Dimension | Centralized Wins | DePIN Wins |
|---|---|---|
| Speed | Faster iteration | Faster deployment at scale |
| Quality control | Easier to enforce | Cryptographic verification |
| Cost structure | Fixed (data centers) | Variable (community devices) |
| Geographic reach | Limited by capex | Unlimited by incentive |
| Data sovereignty | Corporate controlled | User controlled |
The Convergence
The winning model likely combines both:
- Centralized quality (Scale AI-style verification) + Decentralized collection (DePIN sensor networks)
- Hyperscaler compute (for frontier training) + Distributed GPU (for inference at scale)
- Proprietary models (for cutting edge) + Open-weight models (for distribution)
Investment Thesis
| Layer | Conviction | Position | Risk |
|---|---|---|---|
| Collection | High | GEODNET, Hivemapper | Device saturation |
| Connectivity | Medium | Helium via telecom thesis | Token sustainability |
| Storage | Medium | Filecoin for enterprise | Cloud price wars |
| Compute | High | io.net, Render | Hyperscaler response |
| Application | Watch | Open-weight ecosystem | Winner unclear |
Context
- AI Data Overview — The transformation thesis
- Platform — ABCD stack for data
- Protocols — How each layer operates
- Telecom Players — Parallel ecosystem
- DePIN Tokens — Token economics