AI Data Protocols
How data moves from sensor to intelligence. Five layers, each with its own protocol patterns.
The Data Pipeline
COLLECT → CONNECT → STORE → COMPUTE → APPLY
↓ ↓ ↓ ↓ ↓
Sensors Networks Persist Train Deploy
(DePIN) (Helium) (IPFS) (GPU) (Models)
Each layer has centralized incumbents and DePIN challengers. The protocols define how data flows between layers.
Collection Protocols
How raw data is gathered from the physical world.
DePIN Sensor Deployment
| Step | Action | Protocol | Verification |
|---|---|---|---|
| 1 | Deploy device | Physical installation | Location proof |
| 2 | Calibrate | Sensor initialization | Quality attestation |
| 3 | Collect | Continuous measurement | Timestamp + signature |
| 4 | Transmit | Push to aggregator | Delivery confirmation |
| 5 | Reward | Token distribution | Proof of contribution |
Collection Categories
| Category | What's Collected | DePIN Protocol | Precision |
|---|---|---|---|
| Positioning | RTK corrections | GEODNET | Centimeter |
| Mapping | Street-level imagery | Hivemapper | Visual |
| Weather | Temperature, humidity, pressure | WeatherXM | Station-grade |
| Wireless | Coverage attestations | Helium | Signal strength |
| Web data | Internet content | Grass | Page-level |
| Energy | Grid measurements | Daylight Energy | Meter-grade |
Proof of Collection
Every data point needs provenance. The protocol pattern:
Device Identity → Timestamp → Location → Measurement → Signature → Chain
Why it matters: Unverified data is commodity. Cryptographically attested data is premium. The attestation layer is where DePIN creates defensible value.
Connectivity Protocols
How data moves from collection point to processing.
| Protocol | Use Case | Throughput | Range |
|---|---|---|---|
| LoRaWAN | IoT sensors, low bandwidth | Low | Long (km) |
| Helium 5G | Mobile, high bandwidth | High | Medium |
| WiFi/CBRS | Dense urban coverage | High | Short |
| Satellite | Remote, global coverage | Medium | Global |
The Three Flows
Same architecture as telecom:
DATA INTENT → ROUTE → INFRASTRUCTURE → SETTLE → FEEDBACK
↓ ↓ ↓ ↓ ↓
Request Path AI Network link Payment Quality
| Flow Stage | Data Implementation | Provider |
|---|---|---|
| Intent | Data request (query, stream) | Consumer application |
| Route | Optimal path selection | Network AI |
| Infrastructure | Physical connectivity | DePIN operators |
| Settle | Micropayment for delivery | Blockchain |
| Feedback | Quality metrics, latency | Protocol oracle |
Storage Protocols
How data persists for training and retrieval.
| Protocol | Model | Best For | Trade-off |
|---|---|---|---|
| Filecoin | Incentivized IPFS | Large datasets, cold storage | Retrieval speed |
| Arweave | Permanent storage | Immutable records, proofs | Cost per MB |
| Ceramic | Mutable data streams | User profiles, session data | Complexity |
| IPFS | Content-addressed | Deduplication, sharing | No incentive layer |
Storage Workflow
Raw Data → Preprocess → Deduplicate → Store → Index → Serve
↓ ↓ ↓
Quality gate CID/proof Discovery
The cost curve: Decentralized storage is already price-competitive with AWS S3 for cold storage. Hot storage and retrieval remain centralized advantages.
Compute Protocols
How data becomes intelligence through training and inference.
Distributed GPU Networks
| Protocol | Focus | GPU Count | Revenue Model |
|---|---|---|---|
| io.net | General GPU compute | 500K+ | Marketplace fees |
| Render | Graphics + AI rendering | 100K+ | Burn-mint equilibrium |
| Akash | General cloud compute | Growing | Reverse auction |
| Gensyn | ML training verification | Early | Proof of training |
Training Workflow
Dataset → Preprocess → Distribute → Train → Verify → Aggregate → Model
↓ ↓ ↓ ↓
Quality check GPU selection Epochs Proof of training
Inference Workflow
Query → Route → Edge/Cloud → Process → Return → Settle
↓ ↓ ↓
Latency optimize Model select Micropayment
The thesis: Training is batch — price matters most. Inference is real-time — latency matters most. Distributed networks win on price (training) while edge networks win on latency (inference).
Application Protocols
How trained models reach users and generate value.
| Pattern | Description | Example |
|---|---|---|
| API marketplace | Models served via API, pay-per-query | Replicate, Together |
| Edge deployment | Models run on device, zero latency | On-device inference |
| Agent protocols | AI agents discover and use models | MCP, A2A |
| Data marketplace | Raw and processed data traded | Ocean Protocol |
The Full Loop
Collection → Storage → Compute → Application
↑ ↓
└──── Application generates ────────┘
new data for collection
This is the VVFL in protocol form. Each layer feeds the next. The loop accelerates with scale.
Protocol Economics
| Layer | Revenue Capture | Token Mechanism |
|---|---|---|
| Collection | Data sale fees | Proof of contribution rewards |
| Connectivity | Transfer fees | Data credit burn |
| Storage | Storage fees | Capacity staking |
| Compute | Processing fees | GPU staking + burn |
| Application | Query/API fees | Usage-based burn |
The integration thesis: Protocols that span multiple layers capture more value. A network that collects AND stores AND computes has structural advantages over single-layer plays.
Context
- AI Data Overview — The transformation thesis
- Platform — ABCD stack for each layer
- Players — Who operates at each layer
- Telecom Protocols — Parallel protocol architecture