Skip to main content

Browser APIs

When does a browser tab stop being a page and start being an instrument?

A browser tab controls five layers: pixels, sound, input, devices, and state. Once you think that way, a tab becomes a runtime for tools, games, media systems, collaborative apps, and instruments. The same matrix thinking that maps AI modalities applies here — empty cells are experiments waiting to happen.

Five Layers

LayerWhat It ControlsKey APIs
PixelsRendering, 3D, video, screen captureCanvas, WebGL, WebGPU, WebXR, Screen Capture, MediaRecorder
SoundSynthesis, analysis, spatial audio, musicWeb Audio, WebMIDI, MediaStream
InputVoice, gesture, orientation, gamepadWeb Speech, Pointer Events, DeviceOrientation, DeviceMotion, Gamepad
DevicesHardware, bluetooth, USB, hapticsWebHID, WebUSB, Web Bluetooth, Vibration, BarcodeDetector
StateFiles, storage, offline, realtime syncFile System Access, IndexedDB, Service Workers, WebRTC, BroadcastChannel

Pixel Experiments

  • Canvas 2D — Drawing, animation, pixel manipulation
  • WebGL — GPU-accelerated 3D rendering, shaders, data visualisation
  • WebGPU — Next-gen GPU compute and rendering (successor to WebGL)
  • WebXR — VR/AR device access, spatial tracking, hand input
  • Screen CapturegetDisplayMedia() to capture screen/window/tab as live stream
  • MediaRecorder — Record canvas, screen capture, or camera to video file
  • Picture-in-Picture — Float video over other content
  • OffscreenCanvas — Render in a Web Worker without blocking UI
  • SVG + Animation — Declarative vector graphics with SMIL or CSS animation
  • EyeDropper — Pick colour from any pixel on screen

Sound Experiments

  • Web Audio (oscillators) — Generate waveforms: sine, square, sawtooth, triangle
  • Web Audio (filters) — Lowpass, highpass, bandpass, notch, peaking
  • Web Audio (analyser) — FFT frequency data and time-domain waveforms
  • Web Audio (spatial) — 3D positioned sound sources with HRTF panning
  • Web Audio (convolver) — Reverb and impulse response processing
  • WebMIDI — Connect MIDI keyboards, controllers, drum pads
  • MediaStream (mic) — Capture microphone input for processing or recording

Input Experiments

  • Speech Recognition — Voice commands to text, continuous listening
  • Speech Synthesis — Text to spoken voice with pitch, rate, voice selection
  • Pointer Events — Unified mouse, touch, pen input with pressure and tilt
  • DeviceOrientation — Compass heading, tilt angle (alpha, beta, gamma)
  • DeviceMotion — Acceleration, rotation rate, gravity vector
  • Gamepad — Console controllers, racing wheels, flight sticks
  • Pointer Lock — Capture mouse for FPS-style relative movement
  • Fullscreen — Immersive mode, hide browser chrome
  • Drag and Drop — Native drag between elements or from desktop
  • Clipboard — Read/write text, images, rich content programmatically

Device Experiments

  • WebHID — Raw HID device communication (custom controllers, barcode scanners)
  • WebUSB — Direct USB device access (Arduino, microcontrollers, printers)
  • Web Bluetooth — BLE device scanning, connecting, reading characteristics
  • Vibration — Haptic patterns on mobile (single pulse, sequences, rhythms)
  • BarcodeDetector — Camera-based barcode/QR scanning without libraries
  • Geolocation — GPS coordinates, heading, speed, altitude
  • Wake Lock — Prevent screen from sleeping during active use
  • Screen Orientation — Lock to portrait/landscape, detect changes

State Experiments

  • File System Access — Open, read, write local files and directories
  • IndexedDB — Structured client-side database with indexes and transactions
  • Service Workers — Intercept network, cache assets, enable offline
  • Background Sync — Defer actions until connectivity returns
  • WebRTC (data) — Peer-to-peer data channels, no server needed
  • WebRTC (media) — Peer-to-peer audio/video streaming
  • BroadcastChannel — Message between tabs/windows of same origin
  • Notifications — System-level push notifications from the browser
  • Web Locks — Coordinate access to shared resources across tabs
  • Storage (Cache API) — Programmatic HTTP cache for offline-first apps

Combination Patterns

The most interesting work combines layers. Each row is a product pattern, not a single API.

PatternLayersAPIs CombinedWhat It Enables
Multiplayer instrumentSound + StateWeb Audio + WebMIDI + WebRTCLive jam sessions in a browser tab
Sovereign workspaceState + PixelsFile System Access + IndexedDB + Service WorkersFigma-like local-first power tools
Hardware dashboardDevices + PixelsWebHID + Canvas + Web AudioPhysical knobs controlling browser visuals
Spatial explorerPixels + InputWebGPU + Pointer Lock + GamepadWalk through data as a 3D environment
Voice interfaceInput + SoundSpeech Recognition + Speech Synthesis + Web AudioSpeak commands, hear responses
Mobile sensoriumInput + DevicesDeviceMotion + Vibration + Fullscreen + GeolocationPhone as physical object, not just screen
Live collaborationState + PixelsWebRTC + Screen Capture + CanvasShared presence, drawing, annotation
Smart importerState + InputClipboard + File System Access + WorkersPaste anything, convert to structured data

Browser Edge

Why browser over native:

  • Installable (PWA) without app store gatekeepers
  • Cross-platform from a single codebase
  • Instant updates, no release cycle
  • Searchable and shareable by URL
  • Secure by default (sandboxed, permissioned)
  • Smaller disk footprint
  • Service Workers enable offline reliability

Lab Prompts

Eight copy-paste experiment prompts for building with these APIs: Browser API Labs.

Context

  • AI Modalities — Same matrix thinking applied to AI input/output transformations
  • Maps — Geolocation and mapping as browser-native capabilities
  • Components — React layer that consumes browser APIs
  • Performance — Load speed impact of heavy browser API usage
  • Matrix Thinking — Empty cells are experiments waiting to happen

Questions

When the browser becomes a sensorium — hearing, seeing, touching, connecting — what remains that only native apps can do?

  • Which combination pattern from the table above has the highest leverage for the ventures?
  • If File System Access + IndexedDB makes the browser a sovereign workspace, what stops every SaaS from becoming a local-first tool?
  • Which device API (HID, USB, Bluetooth) opens the most unexpected product category?
  • How do you build graceful degradation when half the APIs need user permission and the other half need specific hardware?