AI Compute Industry Players
Who participates in the AI compute community — and what positions does each player fill?
Players are the community of participants in the AI compute ecosystem — the WHO. Positions are the roles those players fill — the WHAT. The hat changes; the player remains. (Doctrinal anchor: Ecosystem — every industry has a community of participants.)
AI compute is the bottleneck layer of the AI economy: it determines who can train, who can infer, and at what cost. The data layer that feeds it sits at AI Data Industry Players.
The Ecosystem
The AI compute community has four sides:
- Buyers — AI labs, enterprises, researchers, and inference-API consumers that purchase compute to train models or run applications
- Providers — GPU/TPU/ASIC makers, cloud hyperscalers, and specialised AI cloud operators that supply the compute layer
- Infrastructure — data centres, power grids, networking fabric, and cooling systems that host the compute
- Boundary — export control authorities, energy regulators, competition commissions, and AI governance bodies that set the rules
Every player wears multiple hats. A hyperscaler is simultaneously provider (selling cloud GPU instances), buyer (purchasing Nvidia H100s), and infrastructure operator (building and running the data centre). The position changes per transaction; the player remains.
The five-counterparty model from Ecosystem maps to this industry as follows:
| Counterparty (canonical) | AI-compute-industry expression |
|---|---|
| Customers | AI labs training foundation models, enterprise ML teams fine-tuning and running inference, researchers running experiments, inference-API consumers |
| Suppliers | GPU/TPU/ASIC designers (Nvidia, AMD, Google TPU), fab capacity (TSMC), DRAM/HBM makers (SK Hynix, Micron), liquid cooling providers |
| Employees | GPU cluster engineers, AI infrastructure specialists, MLOps engineers, data centre operators, power/cooling engineers, procurement specialists |
| Owners | Hyperscaler shareholders, AI cloud VC investors, sovereign wealth funds investing in data centre real estate, colocation REIT holders |
| Regulators | US BIS (GPU export controls), energy regulators (data centre power draw + carbon), competition authorities reviewing AI infrastructure concentration, EU AI Act compliance bodies |
Buyer side — players
The buyers of AI compute. The value-generators the industry exists to serve. Player = the WHO. Position filled = what they buy.
| Player (WHO) | Position filled — what they buy | Asymmetry they need closed | Archetype |
|---|---|---|---|
| Frontier AI lab (OpenAI, Anthropic, DeepMind, Meta AI) | Massive training clusters — 10k–100k+ GPU runs | Allocation access vs competitor; power and cooling at scale | Dreamer / Engineer |
| Hyperscaler AI team (Google, Microsoft, Amazon) | Proprietary TPU/AI ASIC clusters + Nvidia for general workloads | Vertical integration to reduce per-token cost; ASIC amortisation horizon | Engineer |
| Enterprise ML team | Cloud GPU instances + managed fine-tuning platforms | Cost per experiment; latency vs throughput trade-off for inference | Realist |
| AI startup / vertical model builder | Spot GPU capacity + inference APIs + training runs | Budget constraints; access to latest hardware before queue clears | Dreamer |
| Research institution / university lab | HPC cluster + cloud credits | Funding cycles vs compute availability; open-weights models reduce own-train cost | Philosopher |
| Inference API consumer (product company) | Tokens per second + cost per million tokens + uptime SLA | Provider lock-in; model capability curve vs cost curve | Engineer |
Provider side — players
The organisations that supply AI compute. Player = the WHO. Position filled = what they provide.
| Player (WHO) | Position filled — what they provide | Where they compete | Archetype |
|---|---|---|---|
| Nvidia | AI GPU + NVLink fabric + CUDA software ecosystem | Architecture leadership (H100 → B200) + CUDA lock-in is the deepest moat in AI | Engineer |
| AMD | AI GPU + ROCm open software stack | Price/performance parity with CUDA; open-source stack as a differentiated wedge | Engineer |
| Google (TPU) | Custom AI ASIC optimised for Transformer workloads + TensorFlow/JAX stack | Captive use + Google Cloud rental; TPU v5 competes on cost-per-token at scale | Engineer |
| Hyperscaler AI cloud (AWS, Azure, GCP) | GPU clusters + managed training/inference platforms + on-demand scaling | Existing enterprise relationships + data-gravity lock-in; packaging the GPU as a managed service | Realist |
| Specialised AI cloud (CoreWeave, Lambda, Together AI) | Bare-metal GPU clusters with AI-optimised networking and storage | Cheaper than hyperscaler for pure training workloads; faster GPU allocation during shortage | Engineer |
| Custom ASIC / neuromorphic (Cerebras, Groq, Tenstorrent) | Wafer-scale or novel-architecture inference chips | Lower latency / higher token throughput on fixed workloads; no general programmability | Engineer / Dreamer |
Infrastructure side — players
The physical and digital layer AI compute runs on. Player = the WHO. Position filled = what they provide.
| Player (WHO) | Position filled — what they provide | Disruption vector | Archetype |
|---|---|---|---|
| Data centre REIT / colocation (Equinix, Digital Realty) | Physical space + power + cooling for AI clusters | AI power density (40–100 kW/rack) strains existing data centre design; new builds required | Realist |
| Grid-scale power provider / utility | Electricity supply + grid interconnection for large loads | AI demand is the fastest-growing new load in decades; permitting and grid connection are the bottleneck | Realist |
| HBM / DRAM memory supplier (SK Hynix, Micron, Samsung) | High-bandwidth memory stacked on GPU dies | HBM is a critical co-constraint with GPU supply; SK Hynix has dominant share of H100 HBM | Engineer |
| High-speed networking (InfiniBand / RoCE: Mellanox/Nvidia, Arista) | Low-latency GPU-to-GPU interconnect across nodes | NVLink dominates intra-node; InfiniBand dominates inter-node; RoCE as a lower-cost alternative | Engineer |
| Liquid cooling systems (Vertiv, Schneider Electric) | Direct liquid cooling for high-density GPU racks | Air cooling fails above 30 kW/rack; liquid is the only path to H100/B200 densities | Engineer |
| MLOps platform (Weights & Biases, Ray, Determined AI) | Experiment tracking + distributed training orchestration + model registry | AI-native platforms reduce the infrastructure-operations burden for ML teams | Engineer |
Boundary side — players
Sets the rules the other three sides operate inside. Player = the WHO. Position filled = function held in the system.
| Player (WHO) | Position filled — function held | Repeat-player advantage |
|---|---|---|
| US BIS (export controls on AI chips) | Restricts export of advanced AI GPUs (A100/H100/H200/B200) to China and other entities | Entity list and licensing tiers restructure global AI capability asymmetry in weeks |
| National energy regulator (FERC, Ofgem, European ENTSO-E) | Grid interconnection approval + power-purchase contract oversight for large data centre loads | AI data centres are now large enough to affect regional grid planning |
| Competition authority (DOJ, EC, FTC) | Antitrust review of AI infrastructure concentration + hyperscaler AI acquisitions | Nvidia's GPU moat + hyperscaler packaging of AI compute under review in multiple jurisdictions |
| EU AI Act authority | High-risk AI system compliance + foundation-model transparency and compute-threshold obligations | Compute-use reporting obligations at >10^25 FLOPs training run threshold |
| National AI strategy bodies (NIST AI RMF, UK DSIT, Singapore MAS) | Standards + incident reporting + voluntary commitments + evaluation frameworks | Governments are early buyers of AI compute; their standards shape enterprise adoption |
The Five Archetypes Across the Community
The fractal pattern names five archetypes that appear at every layer of every system. AI compute is no exception.
- Dreamer — The frontier lab founder who believes the next training run unlocks emergent capability nobody predicted. The startup building the wafer-scale chip that makes Nvidia unnecessary. The DePIN protocol that turns distributed edge GPUs into a training cluster.
- Realist — The hyperscaler CFO who models the GPU capex payback against five scenarios. The enterprise ML lead who says "we can fine-tune on 8 GPUs — we don't need the cluster." The procurement team that diversified chip suppliers before the export control changed.
- Engineer — The GPU cluster network engineer who hits 90% MFU on a 10k-node training run. The MLOps lead who cuts training cost 40% by optimising data pipelines. The ASIC architect who closes the cost-per-token against Nvidia at production scale.
- Coach — The ML platform lead who makes the GPU cluster accessible to the 50-person product team that can't hire a cluster engineer. The AI education creator who teaches practitioners to use compute efficiently. The community builder who turns the open-weights ecosystem into a shared training capability.
- Philosopher — The researcher asking whether scaling laws hold at 10^28 FLOPs — or whether the next capability jump requires an architectural break. The energy researcher auditing whether the AI compute buildout is compatible with national decarbonisation commitments. The ethicist asking whether access to frontier AI compute should be governed like nuclear capability.
A healthy AI compute community has all five archetypes present. When the Dreamer and Engineer dominate and the Philosopher disappears, the compute buildout concentrates in ways the grid, the regulator, and the competitor can break overnight.
Positions Matrix — Human vs AI Split
Players hold positions. Each position has a human-vs-AI split that is shifting. The hat changes; the player remains — but AI does an increasing share of the work inside the hat.
| Position | Human today | AI today | Direction (3–5 years) |
|---|---|---|---|
| GPU cluster operator | Human runbook + incident response | AI-automated failure detection + predictive maintenance | Human for novel failure modes and capacity planning decisions |
| MLOps / training infrastructure engineer | Human job orchestration + cost optimisation | AI optimises job scheduling and resource allocation | Human focus shifts to architecture and cost model; AI handles run-time |
| Data centre power engineer | Human load forecasting + UPS/cooling management | AI predicts power demand spikes + pre-stages cooling | Fewer humans per MW; residual is emergency response and novel load profiles |
| AI procurement specialist | Human vendor relationship + contract negotiation | AI models should-cost + tracks allocation availability | Human for strategic vendor relationships; AI for commodity GPU spot buys |
| ML researcher (scaling experiments) | Human hypothesis + experimental design | AI runs parameter sweeps + surfaces anomalies | Human irreplaceable for hypothesis formation; AI runs the experiments |
| AI compute policy analyst | Human regulatory interpretation + lobbying | AI tracks rule changes + models compliance scenarios | Human for regulatory strategy; AI for monitoring and reporting |
Archetype Asymmetries — Industry Level
| Archetype | What they bring | Where they win in AI compute |
|---|---|---|
| Dreamer | Conviction that the next architecture break makes today's GPU stack obsolete | The wafer-scale startup; the DePIN training network; the algorithm innovation that makes a 10x smaller model competitive |
| Engineer | Cluster-level MFU optimisation; memory-bandwidth-bounded workload design; ASIC tape-out at cost | Nvidia's CUDA moat; the cluster network engineer who hits 90% MFU; the hyperscaler TPU that closes cost-per-token |
| Realist | Capex payback modelling; allocation risk diversification; export-control scenario planning | The procurement strategy that pre-committed H100 allocation; the enterprise team that right-sized compute before costs scaled |
| Coach | Compute access democratisation; ML infrastructure education; open-weights community enablement | The MLOps platform that makes clusters accessible; the Hugging Face community that amortises training across the ecosystem |
| Philosopher | Energy governance; AI capability proliferation risk; open vs closed model access | Asking whether data centre power demand is compatible with the grid; stress-testing whether export controls are achieving their geopolitical goal |
Context
- depends-on Community → Ecosystem — Five-counterparty model; the hat changes, the player remains
- applies-to Community → Archetypes — The five archetypes mapped across this community
- pairs-with AI Compute Industry Index — Disruption scoring, friction map, sub-vertical entry ranking
- pairs-with AI Data Industry — The data layer that determines what the compute trains
- pairs-with Technology Industry — The semiconductor supply chain that produces the hardware
- pairs-with Energy Industry — The power supply that is now AI compute's binding constraint
- instance-of Standard Templates → Players — Written from the players template
Questions
- Which counterparty's perspective is most invisible in this industry — and what routing signal gets missed as a result?
- If energy becomes the binding constraint before silicon does, which players gain disproportionate leverage — and which lose theirs?
- When inference cost falls to near-zero, does the value in AI compute shift entirely to training — or to the data layer?
- Which archetype is underrepresented in the boundary layer — and what does that explain about how the export-control regime was designed?