Browser APIs
When does a browser tab stop being a page and start being an instrument?
A browser tab controls five layers: pixels, sound, input, devices, and state. Once you think that way, a tab becomes a runtime for tools, games, media systems, collaborative apps, and instruments. The same matrix thinking that maps AI modalities applies here — empty cells are experiments waiting to happen.
Five Layers
| Layer | What It Controls | Key APIs |
|---|---|---|
| Pixels | Rendering, 3D, video, screen capture | Canvas, WebGL, WebGPU, WebXR, Screen Capture, MediaRecorder |
| Sound | Synthesis, analysis, spatial audio, music | Web Audio, WebMIDI, MediaStream |
| Input | Voice, gesture, orientation, gamepad | Web Speech, Pointer Events, DeviceOrientation, DeviceMotion, Gamepad |
| Devices | Hardware, bluetooth, USB, haptics | WebHID, WebUSB, Web Bluetooth, Vibration, BarcodeDetector |
| State | Files, storage, offline, realtime sync | File System Access, IndexedDB, Service Workers, WebRTC, BroadcastChannel |
Pixel Experiments
- Canvas 2D — Drawing, animation, pixel manipulation
- WebGL — GPU-accelerated 3D rendering, shaders, data visualisation
- WebGPU — Next-gen GPU compute and rendering (successor to WebGL)
- WebXR — VR/AR device access, spatial tracking, hand input
- Screen Capture —
getDisplayMedia()to capture screen/window/tab as live stream - MediaRecorder — Record canvas, screen capture, or camera to video file
- Picture-in-Picture — Float video over other content
- OffscreenCanvas — Render in a Web Worker without blocking UI
- SVG + Animation — Declarative vector graphics with SMIL or CSS animation
- EyeDropper — Pick colour from any pixel on screen
Sound Experiments
- Web Audio (oscillators) — Generate waveforms: sine, square, sawtooth, triangle
- Web Audio (filters) — Lowpass, highpass, bandpass, notch, peaking
- Web Audio (analyser) — FFT frequency data and time-domain waveforms
- Web Audio (spatial) — 3D positioned sound sources with HRTF panning
- Web Audio (convolver) — Reverb and impulse response processing
- WebMIDI — Connect MIDI keyboards, controllers, drum pads
- MediaStream (mic) — Capture microphone input for processing or recording
Input Experiments
- Speech Recognition — Voice commands to text, continuous listening
- Speech Synthesis — Text to spoken voice with pitch, rate, voice selection
- Pointer Events — Unified mouse, touch, pen input with pressure and tilt
- DeviceOrientation — Compass heading, tilt angle (alpha, beta, gamma)
- DeviceMotion — Acceleration, rotation rate, gravity vector
- Gamepad — Console controllers, racing wheels, flight sticks
- Pointer Lock — Capture mouse for FPS-style relative movement
- Fullscreen — Immersive mode, hide browser chrome
- Drag and Drop — Native drag between elements or from desktop
- Clipboard — Read/write text, images, rich content programmatically
Device Experiments
- WebHID — Raw HID device communication (custom controllers, barcode scanners)
- WebUSB — Direct USB device access (Arduino, microcontrollers, printers)
- Web Bluetooth — BLE device scanning, connecting, reading characteristics
- Vibration — Haptic patterns on mobile (single pulse, sequences, rhythms)
- BarcodeDetector — Camera-based barcode/QR scanning without libraries
- Geolocation — GPS coordinates, heading, speed, altitude
- Wake Lock — Prevent screen from sleeping during active use
- Screen Orientation — Lock to portrait/landscape, detect changes
State Experiments
- File System Access — Open, read, write local files and directories
- IndexedDB — Structured client-side database with indexes and transactions
- Service Workers — Intercept network, cache assets, enable offline
- Background Sync — Defer actions until connectivity returns
- WebRTC (data) — Peer-to-peer data channels, no server needed
- WebRTC (media) — Peer-to-peer audio/video streaming
- BroadcastChannel — Message between tabs/windows of same origin
- Notifications — System-level push notifications from the browser
- Web Locks — Coordinate access to shared resources across tabs
- Storage (Cache API) — Programmatic HTTP cache for offline-first apps
Combination Patterns
The most interesting work combines layers. Each row is a product pattern, not a single API.
| Pattern | Layers | APIs Combined | What It Enables |
|---|---|---|---|
| Multiplayer instrument | Sound + State | Web Audio + WebMIDI + WebRTC | Live jam sessions in a browser tab |
| Sovereign workspace | State + Pixels | File System Access + IndexedDB + Service Workers | Figma-like local-first power tools |
| Hardware dashboard | Devices + Pixels | WebHID + Canvas + Web Audio | Physical knobs controlling browser visuals |
| Spatial explorer | Pixels + Input | WebGPU + Pointer Lock + Gamepad | Walk through data as a 3D environment |
| Voice interface | Input + Sound | Speech Recognition + Speech Synthesis + Web Audio | Speak commands, hear responses |
| Mobile sensorium | Input + Devices | DeviceMotion + Vibration + Fullscreen + Geolocation | Phone as physical object, not just screen |
| Live collaboration | State + Pixels | WebRTC + Screen Capture + Canvas | Shared presence, drawing, annotation |
| Smart importer | State + Input | Clipboard + File System Access + Workers | Paste anything, convert to structured data |
Browser Edge
Why browser over native:
- Installable (PWA) without app store gatekeepers
- Cross-platform from a single codebase
- Instant updates, no release cycle
- Searchable and shareable by URL
- Secure by default (sandboxed, permissioned)
- Smaller disk footprint
- Service Workers enable offline reliability
Lab Prompts
Eight copy-paste experiment prompts for building with these APIs: Browser API Labs.
Context
- AI Modalities — Same matrix thinking applied to AI input/output transformations
- Maps — Geolocation and mapping as browser-native capabilities
- Components — React layer that consumes browser APIs
- Performance — Load speed impact of heavy browser API usage
- Matrix Thinking — Empty cells are experiments waiting to happen
Links
- MDN Web APIs — Complete reference for all browser APIs
- Can I Use — Browser support tables for every API
- Chrome Status — Shipping and experimental API tracker
- Project Fugu — Chromium project closing the native app gap
Questions
When the browser becomes a sensorium — hearing, seeing, touching, connecting — what remains that only native apps can do?
- Which combination pattern from the table above has the highest leverage for the ventures?
- If File System Access + IndexedDB makes the browser a sovereign workspace, what stops every SaaS from becoming a local-first tool?
- Which device API (HID, USB, Bluetooth) opens the most unexpected product category?
- How do you build graceful degradation when half the APIs need user permission and the other half need specific hardware?