AI Products
What changes when your product thinks?
Traditional products are deterministic. Same input, same output. Test it once, ship it. AI products produce a distribution of outcomes — the same input generates different outputs every time. This changes everything about how you build, test, and ship.
The gap between "AI demo" and "AI product" is an evaluation gap. Demos impress with best-case outputs. Products must handle the full distribution — including the tail where things go wrong.
The Shift
| Deterministic Product | AI Product |
|---|---|
| Bug = broken code | Bad output = expected variance |
| Test once, ship | Evaluate continuously |
| Binary: works or doesn't | Spectrum: how often, how good |
| Reproduce every issue | Some failures are statistical |
| 100% quality possible | Quality = acceptable distribution |
| Spec defines behavior | Spec defines boundaries |
AI Product Tight Five
The same five questions applied to building with AI:
| # | Question | AI Product Translation | Where |
|---|---|---|---|
| 1 | Why does this matter? | What job does the AI do that wasn't possible before? | Principles |
| 2 | What truths guide you? | What does "good" mean for this output? | Principles |
| 3 | What do you control? | What can you measure, test, and improve? | Evaluation |
| 4 | What do you see? | Where is the model failing that users haven't reported yet? | Observability |
| 5 | How do you know? | Are eval scores improving AND users happier? | Observability |
The Loop
The VVFL applied to AI products:
DEFINE "GOOD" → BUILD EVALS → SHIP → MEASURE → LEARN → REDEFINE "GOOD"
↑ |
└────────────────────────────────────────────────────────┘
Every cycle tightens the distribution. Quality isn't a destination — it's a feedback loop.
| Stage | Activity | Output |
|---|---|---|
| Define | Set quality principles | Dimensions, rubrics, failure budgets |
| Build | Write requirements | AI PRD with eval criteria |
| Measure | Run evaluations | Scores across golden dataset |
| See | Analyze traces | Where it fails, why, how often |
| Learn | Close the gap | Tighter prompts, better data, updated evals |
The Business Loop: Agency to VSaaS
Building an AI product isn't just an engineering task—it's an economic one. We use the AI-Native Agency model to validate the product before scaling it as Vertical SaaS (VSaaS).
- Agency Phase: Use the AI product as an internal tool. Humans handle 50-60% of the work. Validate the "Good" definition with paying clients.
- Productized Phase: Automate the workflows. Target 90% AI production. Human QA handles the remaining 10% (the "Left Tail" of the distribution).
- VSaaS Phase: Ship the tool to the industry. Shift from outcome-based pricing to subscription-based recurring revenue.
Work Chart
Who does what in AI product development vs. an AI-Native Agency?
| Activity | Human Role | AI Role | AI % (Dev) | AI % (Agency) |
|---|---|---|---|---|
| Define quality | Sets dimensions, judges edge cases | Generates rubric variations | 25% | 10% |
| Build golden datasets | Curates, validates, tags | Generates synthetic examples | 50% | 90% |
| Write eval rubrics | Defines scoring criteria | Scores outputs against rubric | 60% | 95% |
| Trace analysis | Pattern recognition, root cause | Surfaces anomalies, clusters failures | 45% | 85% |
| Production | Final judgment, "Taste" | Drafts, researches, formats | N/A | 90% |
Aggregate AI %: 42% (Dev) / 80%+ (Agency) — The goal of an AI product is to shift the production burden from humans to the model, enabling software-like margins in a service world.
Subjects
📄️ Principles
What does "good" mean when the same input produces different outputs?
📄️ Evaluation
How do you know your AI product is getting better?
📄️ Observability
When a user reports a bad experience, can you even reproduce it?
Context
- VVFL Loop — The feedback loop everything builds on
- Prediction Evaluation — The SMART-BF pattern this section extends
- Work Charts — Human/AI capability mapping
- Jobs To Be Done — What job is the AI hired for?
- AI Frameworks — The infrastructure layer beneath products
- Product Design — Design principles that still apply