Skip to main content

AI Compute Industry Players

Who participates in the AI compute community — and what positions does each player fill?

Players are the community of participants in the AI compute ecosystem — the WHO. Positions are the roles those players fill — the WHAT. The hat changes; the player remains. (Doctrinal anchor: Ecosystem — every industry has a community of participants.)

AI compute is the bottleneck layer of the AI economy: it determines who can train, who can infer, and at what cost. The data layer that feeds it sits at AI Data Industry Players.

The Ecosystem

The AI compute community has four sides:

  • Buyers — AI labs, enterprises, researchers, and inference-API consumers that purchase compute to train models or run applications
  • Providers — GPU/TPU/ASIC makers, cloud hyperscalers, and specialised AI cloud operators that supply the compute layer
  • Infrastructure — data centres, power grids, networking fabric, and cooling systems that host the compute
  • Boundary — export control authorities, energy regulators, competition commissions, and AI governance bodies that set the rules

Every player wears multiple hats. A hyperscaler is simultaneously provider (selling cloud GPU instances), buyer (purchasing Nvidia H100s), and infrastructure operator (building and running the data centre). The position changes per transaction; the player remains.

The five-counterparty model from Ecosystem maps to this industry as follows:

Counterparty (canonical)AI-compute-industry expression
CustomersAI labs training foundation models, enterprise ML teams fine-tuning and running inference, researchers running experiments, inference-API consumers
SuppliersGPU/TPU/ASIC designers (Nvidia, AMD, Google TPU), fab capacity (TSMC), DRAM/HBM makers (SK Hynix, Micron), liquid cooling providers
EmployeesGPU cluster engineers, AI infrastructure specialists, MLOps engineers, data centre operators, power/cooling engineers, procurement specialists
OwnersHyperscaler shareholders, AI cloud VC investors, sovereign wealth funds investing in data centre real estate, colocation REIT holders
RegulatorsUS BIS (GPU export controls), energy regulators (data centre power draw + carbon), competition authorities reviewing AI infrastructure concentration, EU AI Act compliance bodies

Buyer side — players

The buyers of AI compute. The value-generators the industry exists to serve. Player = the WHO. Position filled = what they buy.

Player (WHO)Position filled — what they buyAsymmetry they need closedArchetype
Frontier AI lab (OpenAI, Anthropic, DeepMind, Meta AI)Massive training clusters — 10k–100k+ GPU runsAllocation access vs competitor; power and cooling at scaleDreamer / Engineer
Hyperscaler AI team (Google, Microsoft, Amazon)Proprietary TPU/AI ASIC clusters + Nvidia for general workloadsVertical integration to reduce per-token cost; ASIC amortisation horizonEngineer
Enterprise ML teamCloud GPU instances + managed fine-tuning platformsCost per experiment; latency vs throughput trade-off for inferenceRealist
AI startup / vertical model builderSpot GPU capacity + inference APIs + training runsBudget constraints; access to latest hardware before queue clearsDreamer
Research institution / university labHPC cluster + cloud creditsFunding cycles vs compute availability; open-weights models reduce own-train costPhilosopher
Inference API consumer (product company)Tokens per second + cost per million tokens + uptime SLAProvider lock-in; model capability curve vs cost curveEngineer

Provider side — players

The organisations that supply AI compute. Player = the WHO. Position filled = what they provide.

Player (WHO)Position filled — what they provideWhere they competeArchetype
NvidiaAI GPU + NVLink fabric + CUDA software ecosystemArchitecture leadership (H100 → B200) + CUDA lock-in is the deepest moat in AIEngineer
AMDAI GPU + ROCm open software stackPrice/performance parity with CUDA; open-source stack as a differentiated wedgeEngineer
Google (TPU)Custom AI ASIC optimised for Transformer workloads + TensorFlow/JAX stackCaptive use + Google Cloud rental; TPU v5 competes on cost-per-token at scaleEngineer
Hyperscaler AI cloud (AWS, Azure, GCP)GPU clusters + managed training/inference platforms + on-demand scalingExisting enterprise relationships + data-gravity lock-in; packaging the GPU as a managed serviceRealist
Specialised AI cloud (CoreWeave, Lambda, Together AI)Bare-metal GPU clusters with AI-optimised networking and storageCheaper than hyperscaler for pure training workloads; faster GPU allocation during shortageEngineer
Custom ASIC / neuromorphic (Cerebras, Groq, Tenstorrent)Wafer-scale or novel-architecture inference chipsLower latency / higher token throughput on fixed workloads; no general programmabilityEngineer / Dreamer

Infrastructure side — players

The physical and digital layer AI compute runs on. Player = the WHO. Position filled = what they provide.

Player (WHO)Position filled — what they provideDisruption vectorArchetype
Data centre REIT / colocation (Equinix, Digital Realty)Physical space + power + cooling for AI clustersAI power density (40–100 kW/rack) strains existing data centre design; new builds requiredRealist
Grid-scale power provider / utilityElectricity supply + grid interconnection for large loadsAI demand is the fastest-growing new load in decades; permitting and grid connection are the bottleneckRealist
HBM / DRAM memory supplier (SK Hynix, Micron, Samsung)High-bandwidth memory stacked on GPU diesHBM is a critical co-constraint with GPU supply; SK Hynix has dominant share of H100 HBMEngineer
High-speed networking (InfiniBand / RoCE: Mellanox/Nvidia, Arista)Low-latency GPU-to-GPU interconnect across nodesNVLink dominates intra-node; InfiniBand dominates inter-node; RoCE as a lower-cost alternativeEngineer
Liquid cooling systems (Vertiv, Schneider Electric)Direct liquid cooling for high-density GPU racksAir cooling fails above 30 kW/rack; liquid is the only path to H100/B200 densitiesEngineer
MLOps platform (Weights & Biases, Ray, Determined AI)Experiment tracking + distributed training orchestration + model registryAI-native platforms reduce the infrastructure-operations burden for ML teamsEngineer

Boundary side — players

Sets the rules the other three sides operate inside. Player = the WHO. Position filled = function held in the system.

Player (WHO)Position filled — function heldRepeat-player advantage
US BIS (export controls on AI chips)Restricts export of advanced AI GPUs (A100/H100/H200/B200) to China and other entitiesEntity list and licensing tiers restructure global AI capability asymmetry in weeks
National energy regulator (FERC, Ofgem, European ENTSO-E)Grid interconnection approval + power-purchase contract oversight for large data centre loadsAI data centres are now large enough to affect regional grid planning
Competition authority (DOJ, EC, FTC)Antitrust review of AI infrastructure concentration + hyperscaler AI acquisitionsNvidia's GPU moat + hyperscaler packaging of AI compute under review in multiple jurisdictions
EU AI Act authorityHigh-risk AI system compliance + foundation-model transparency and compute-threshold obligationsCompute-use reporting obligations at >10^25 FLOPs training run threshold
National AI strategy bodies (NIST AI RMF, UK DSIT, Singapore MAS)Standards + incident reporting + voluntary commitments + evaluation frameworksGovernments are early buyers of AI compute; their standards shape enterprise adoption

The Five Archetypes Across the Community

The fractal pattern names five archetypes that appear at every layer of every system. AI compute is no exception.

  • Dreamer — The frontier lab founder who believes the next training run unlocks emergent capability nobody predicted. The startup building the wafer-scale chip that makes Nvidia unnecessary. The DePIN protocol that turns distributed edge GPUs into a training cluster.
  • Realist — The hyperscaler CFO who models the GPU capex payback against five scenarios. The enterprise ML lead who says "we can fine-tune on 8 GPUs — we don't need the cluster." The procurement team that diversified chip suppliers before the export control changed.
  • Engineer — The GPU cluster network engineer who hits 90% MFU on a 10k-node training run. The MLOps lead who cuts training cost 40% by optimising data pipelines. The ASIC architect who closes the cost-per-token against Nvidia at production scale.
  • Coach — The ML platform lead who makes the GPU cluster accessible to the 50-person product team that can't hire a cluster engineer. The AI education creator who teaches practitioners to use compute efficiently. The community builder who turns the open-weights ecosystem into a shared training capability.
  • Philosopher — The researcher asking whether scaling laws hold at 10^28 FLOPs — or whether the next capability jump requires an architectural break. The energy researcher auditing whether the AI compute buildout is compatible with national decarbonisation commitments. The ethicist asking whether access to frontier AI compute should be governed like nuclear capability.

A healthy AI compute community has all five archetypes present. When the Dreamer and Engineer dominate and the Philosopher disappears, the compute buildout concentrates in ways the grid, the regulator, and the competitor can break overnight.

Positions Matrix — Human vs AI Split

Players hold positions. Each position has a human-vs-AI split that is shifting. The hat changes; the player remains — but AI does an increasing share of the work inside the hat.

PositionHuman todayAI todayDirection (3–5 years)
GPU cluster operatorHuman runbook + incident responseAI-automated failure detection + predictive maintenanceHuman for novel failure modes and capacity planning decisions
MLOps / training infrastructure engineerHuman job orchestration + cost optimisationAI optimises job scheduling and resource allocationHuman focus shifts to architecture and cost model; AI handles run-time
Data centre power engineerHuman load forecasting + UPS/cooling managementAI predicts power demand spikes + pre-stages coolingFewer humans per MW; residual is emergency response and novel load profiles
AI procurement specialistHuman vendor relationship + contract negotiationAI models should-cost + tracks allocation availabilityHuman for strategic vendor relationships; AI for commodity GPU spot buys
ML researcher (scaling experiments)Human hypothesis + experimental designAI runs parameter sweeps + surfaces anomaliesHuman irreplaceable for hypothesis formation; AI runs the experiments
AI compute policy analystHuman regulatory interpretation + lobbyingAI tracks rule changes + models compliance scenariosHuman for regulatory strategy; AI for monitoring and reporting

Archetype Asymmetries — Industry Level

ArchetypeWhat they bringWhere they win in AI compute
DreamerConviction that the next architecture break makes today's GPU stack obsoleteThe wafer-scale startup; the DePIN training network; the algorithm innovation that makes a 10x smaller model competitive
EngineerCluster-level MFU optimisation; memory-bandwidth-bounded workload design; ASIC tape-out at costNvidia's CUDA moat; the cluster network engineer who hits 90% MFU; the hyperscaler TPU that closes cost-per-token
RealistCapex payback modelling; allocation risk diversification; export-control scenario planningThe procurement strategy that pre-committed H100 allocation; the enterprise team that right-sized compute before costs scaled
CoachCompute access democratisation; ML infrastructure education; open-weights community enablementThe MLOps platform that makes clusters accessible; the Hugging Face community that amortises training across the ecosystem
PhilosopherEnergy governance; AI capability proliferation risk; open vs closed model accessAsking whether data centre power demand is compatible with the grid; stress-testing whether export controls are achieving their geopolitical goal

Context

Questions

  • Which counterparty's perspective is most invisible in this industry — and what routing signal gets missed as a result?
  • If energy becomes the binding constraint before silicon does, which players gain disproportionate leverage — and which lose theirs?
  • When inference cost falls to near-zero, does the value in AI compute shift entirely to training — or to the data layer?
  • Which archetype is underrepresented in the boundary layer — and what does that explain about how the export-control regime was designed?