Skip to main content

AI Products

What changes when your product thinks?

Traditional products are deterministic. Same input, same output. Test it once, ship it. AI products produce a distribution of outcomes — the same input generates different outputs every time. This changes everything about how you build, test, and ship.

The gap between "AI demo" and "AI product" is an evaluation gap. Demos impress with best-case outputs. Products must handle the full distribution — including the tail where things go wrong.

The Shift

Deterministic ProductAI Product
Bug = broken codeBad output = expected variance
Test once, shipEvaluate continuously
Binary: works or doesn'tSpectrum: how often, how good
Reproduce every issueSome failures are statistical
100% quality possibleQuality = acceptable distribution
Spec defines behaviorSpec defines boundaries

AI Product Tight Five

The same five questions applied to building with AI:

#QuestionAI Product TranslationWhere
1Why does this matter?What job does the AI do that wasn't possible before?Principles
2What truths guide you?What does "good" mean for this output?Principles
3What do you control?What can you measure, test, and improve?Evaluation
4What do you see?Where is the model failing that users haven't reported yet?Observability
5How do you know?Are eval scores improving AND users happier?Observability

The Loop

The VVFL applied to AI products:

DEFINE "GOOD" → BUILD EVALS → SHIP → MEASURE → LEARN → REDEFINE "GOOD"
↑ |
└────────────────────────────────────────────────────────┘

Every cycle tightens the distribution. Quality isn't a destination — it's a feedback loop.

StageActivityOutput
DefineSet quality principlesDimensions, rubrics, failure budgets
BuildWrite requirementsAI PRD with eval criteria
MeasureRun evaluationsScores across golden dataset
SeeAnalyze tracesWhere it fails, why, how often
LearnClose the gapTighter prompts, better data, updated evals

The Business Loop: Agency to VSaaS

Building an AI product isn't just an engineering task—it's an economic one. We use the AI-Native Agency model to validate the product before scaling it as Vertical SaaS (VSaaS).

  1. Agency Phase: Use the AI product as an internal tool. Humans handle 50-60% of the work. Validate the "Good" definition with paying clients.
  2. Productized Phase: Automate the workflows. Target 90% AI production. Human QA handles the remaining 10% (the "Left Tail" of the distribution).
  3. VSaaS Phase: Ship the tool to the industry. Shift from outcome-based pricing to subscription-based recurring revenue.

Work Chart

Who does what in AI product development vs. an AI-Native Agency?

ActivityHuman RoleAI RoleAI % (Dev)AI % (Agency)
Define qualitySets dimensions, judges edge casesGenerates rubric variations25%10%
Build golden datasetsCurates, validates, tagsGenerates synthetic examples50%90%
Write eval rubricsDefines scoring criteriaScores outputs against rubric60%95%
Trace analysisPattern recognition, root causeSurfaces anomalies, clusters failures45%85%
ProductionFinal judgment, "Taste"Drafts, researches, formatsN/A90%

Aggregate AI %: 42% (Dev) / 80%+ (Agency) — The goal of an AI product is to shift the production burden from humans to the model, enabling software-like margins in a service world.

Subjects

Context