Pipeline Nowcast Spec
How do you know the factory is on track before the weekly meeting tells you it isn't?
Build Contract
| # | Feature | Function | Outcome | Job | State |
|---|---|---|---|---|---|
| 1 | Signal Collectors (5x) | Extract raw measurements from each source table | All 5 sources feeding one algorithm | Collect | Gap |
| 2 | Signal Normalization | Normalize each signal to 0-1 scale per type | Apples-to-apples comparison | Normalize | Gap |
| 3 | Exponential Decay | Weight signals by recency with configurable half-life | Recent signals matter more | Weight | Gap |
| 4 | Variance Computation | Per-signal (actual - forecast) / forecast | Know which signal is drifting | Measure | Gap |
| 5 | Composite Scoring | Weighted sum with configurable signal weights | Single number: are we on track? | Compose | Gap |
| 6 | Classification | Map composite to on_track / warning / critical | Status matches varianceStatus enum | Classify | Gap |
| 7 | Confidence Calculation | Score based on signal coverage and freshness | Trust the number or gather more data | Calibrate | Gap |
| 8 | Recommendations Engine | Top risk, top momentum, action list | Know what to do next, not just status | Advise | Gap |
| 9 | Constants Registration | PIPELINE_NOWCAST block in algorithm-constants.ts | All thresholds tunable without code | Configure | Gap |
| 10 | Forecast Baselines | Target values per signal type (config) | Variance has something to compare to | Baseline | Gap |
| 11 | Prediction Evidence Ledger | Store signal observations against predictions | Predictions have data, not just opinions | Evidence | Gap |
| 12 | State Transition Log | Timestamp every L0-L4 commissioning state change | Commissioning velocity measurable | Velocity | Gap |
| 13 | Internal Signal Collectors | Collect prediction evidence from own platform | Receipt accuracy, boot time, trust scores feed predictions | Introspect | Gap |
| 14 | Market Signal Collectors | Automated external scans (RWA TVL, dev counts, agent commerce) | Predictions validated against market reality | Scan | Gap |
| 15 | Bayesian Update Triggers | Threshold-based conviction reassessment protocol | Stale predictions auto-flag for review | Update | Gap |
| 16 | Resolution Protocol | Confirm or falsify predictions with timestamped evidence | Prediction database has L4 entries | Resolve | Gap |
Principles
What truths constrain the design?
The Job
| Element | Detail |
|---|---|
| Situation | Five signal systems built independently. Each has a dashboard. None connect. |
| Intention | A single composite score answering "are we on track, drifting, or in trouble?" |
| Obstacle | No algorithm normalizes, weights, and composes these signals with recency decay. |
Why Now
All five signal sources exist in production:
- CRM: 5 deals, 10 activities, $1.2M pipeline (commissioning 2026-03-02)
- Agent comms: Convex event stream with typed messages
- Commissioning: 74 features at L0-L3 (no velocity tracking — state transitions have no timestamps)
- Predictions: 76 predictions in database, conviction-scored, zero evidence feeds
The data exists. The synthesis doesn't. Every commissioning session (45 min manual) could be a 2-second algorithm call.
Prediction validation gap: The prediction database has 76 entries with conviction scores but no live data updating them. The Bayesian protocol exists on paper (prompt-predictions.md Step 4) but has no automated triggers. Predictions without evidence feeds are opinions with timestamps. The nowcast closes this gap by treating prediction evidence as a first-class signal source — same normalization, same decay, same composite scoring.
Design Constraints
| Constraint | Rationale |
|---|---|
| Pure function, no side effects | Testable, composable, matches algorithm framework |
| All thresholds in constants file | Tunable without code changes |
| Minimum 3 signals for confidence > 0.5 | Prevents false confidence from sparse data |
Match existing varianceStatus enum | Interop with prediction schemas |
| Exponential decay, not linear | Recent signals should dominate, old signals fade smoothly |
Refusal Spec
| Category | Action | Response |
|---|---|---|
| Insufficient signals | Fewer than 1 signal available | Return confidence: 0, status: critical, reasoning: "No signals" |
| Stale data | All signals older than 2x half-life | Return confidence below 0.3, flag staleness |
| Invalid forecasts | Forecast value is 0 or negative | Skip signal, reduce confidence, log warning |
Performance
How do we know it's working?
Priority Score
| Dimension | Score | Evidence |
|---|---|---|
| Pain | 4 | 45 min/session manual synthesis across 5 dashboards. 76 predictions with zero evidence feeds. Drift invisible between meetings. |
| Demand | 4 | Internal demand (commissioning + prediction validation). Prediction evidence validates the thesis the whole platform depends on. |
| Edge | 4 | Proprietary signal combination (CRM + agent comms + commissioning state + prediction evidence). Nobody else has this data shape. |
| Trend | 5 | Nowcasting + prediction markets is the dominant pattern. Superforecasting discipline going mainstream. AI operations demand real-time variance. |
| Conversion | 2 | Internal tool first. Path to customer-facing via BOaaS later. |
| Composite | 640 | 4 x 4 x 4 x 5 x 2. Demand up (prediction validation is existential). Trend up (prediction markets + nowcasting convergence). |
Quality Targets
| Metric | Target | Method |
|---|---|---|
| Execution time | <500ms for 5 signals | Benchmark test |
| Classification accuracy | Matches manual assessment 5 consecutive days | Human comparison |
| Signal coverage confidence | Degrades gracefully below 3 signals | Unit test with partial inputs |
Eval Strategy
| What | How | When |
|---|---|---|
| Classification accuracy | Compare nowcast status vs dream team manual assessment | Daily for first 2 weeks |
| Signal freshness | Check timestamp of newest signal per type | Every run |
| Threshold calibration | Review false positive/negative rate | Weekly for first month |
Kill signal: Nowcast status disagrees with manual assessment for 5 consecutive days after 2-week calibration period. Algorithm is wrong or signals are wrong.
Platform
What do we control?
Current State
| Component | Built | Wired | Working | Notes |
|---|---|---|---|---|
| Algorithm framework | Yes | Yes | Yes | libs/agency/src/lib/algorithms/ |
AlgorithmMetadata pattern | Yes | Yes | Yes | Standard export |
algorithm-constants.ts | Yes | Yes | Yes | Extensible |
| CRM data (Drizzle) | Yes | Yes | Yes | 5 deals, 10 activities |
| Convex agent messages | Yes | Yes | Yes | HTTP client working |
| Commissioning state | Yes | Partial | Partial | Manual markdown, needs parser. No timestamps on state transitions. |
| Prediction schemas | Yes | No | No | Tables empty. 76 predictions in markdown, zero in DB. |
| Prediction evidence ledger | No | No | No | Build — junction between predictions and data |
| State transition log | No | No | No | Build — timestamps on L0-L4 changes |
| Internal signal collectors | No | No | No | Build — receipt accuracy, sprout boot time, trust scores |
| Market signal collectors | No | No | No | Build — RWA TVL, dev counts, agent commerce volume |
| Signal normalization | No | No | No | Build |
| Exponential decay | No | No | No | Build |
| Composite scoring | No | No | No | Build |
| Insights UI | Yes | No | No | Components exist, no data |
Build Ratio
~60% composition, ~40% new code. Prediction evidence system is net-new domain.
Algorithm Interface
Input
interface NowcastInput {
pipeline: {
deals: Array<{ amount: number; probability: number; stage: string; closeDate: string }>;
targetCoverage: number; // default 3.0x
};
activity: {
activities: Array<{ type: string; outcome: string; startDate: string }>;
targetPerDealPerWeek: number; // default 2
dealCount: number;
};
agentVelocity: {
messages: Array<{ type: string; createdAt: number }>;
plansCompleted: number;
blockersOpen: number;
};
commissioning: {
features: Array<{ name: string; currentLevel: number; forecastLevel: number }>;
};
predictions: {
entries: Array<{ confidenceScore: number; accuracyScore: number | null; status: string }>;
evidence: Array<PredictionEvidence>; // evidence ledger entries
maturity: Array<PredictionMaturity>; // per-prediction maturity state
};
config?: {
weights?: Partial<SignalWeights>;
decayHalfLifeDays?: number;
thresholds?: { onTrack?: number; warning?: number };
};
}
Output
interface NowcastResult {
result: {
composite: number; // 0-1
status: "on_track" | "warning" | "critical";
signals: NowcastSignal[]; // per-signal breakdown
topRisk: string; // highest-variance signal name
topMomentum: string; // most-improving signal name
};
metadata: {
algorithm: "pipeline-nowcast";
version: string;
executionTimeMs: number;
signalCount: number;
};
reasoning: string[];
confidence: number; // 0-1 based on signal coverage
recommendations: {
status: "on_track" | "warning" | "critical";
actions: string[];
signalsNeedingAttention: string[];
};
}
interface NowcastSignal {
name: string;
score: number; // 0-1 normalized
weight: number; // configured weight
variance: number; // (actual - forecast) / forecast
trend: "improving" | "stable" | "declining";
freshness: number; // 0-1 decay factor
}
Constants
export const PIPELINE_NOWCAST = {
PIPELINE_WEIGHT: 0.35,
ACTIVITY_WEIGHT: 0.25,
AGENT_VELOCITY_WEIGHT: 0.15,
COMMISSIONING_WEIGHT: 0.15,
PREDICTION_WEIGHT: 0.1,
DECAY_HALF_LIFE_DAYS: 14,
ON_TRACK_THRESHOLD: 0.7,
WARNING_THRESHOLD: 0.4,
MIN_SIGNALS_FOR_CONFIDENCE: 3,
TARGET_PIPELINE_COVERAGE: 3.0,
TARGET_ACTIVITY_PER_DEAL_WEEK: 2,
} as const;
Prediction Evidence Ledger
The junction table that connects predictions to data. Without this, predictions are opinions with timestamps.
interface PredictionEvidence {
predictionId: string; // links to prediction-database.md row
signalType: "internal" | "market" | "resolution";
dataPoint: string; // what was measured
value: number; // the measurement
source: string; // where it came from
timestamp: string; // ISO 8601
direction: "strengthens" | "weakens" | "neutral";
convictionDelta: number; // how much this moved the score (-1 to +1)
}
interface PredictionMaturity {
level: "L0_stated" | "L1_instrumented" | "L2_tracked" | "L3_tested" | "L4_resolved";
evidenceCount: number;
lastUpdated: string;
convictionHistory: Array<{ score: number; date: string; reason: string }>;
}
Prediction maturity model (mirrors commissioning):
| State | Meaning | Criteria |
|---|---|---|
| L0 Stated | Conviction score assigned, no evidence collected | Default for all 76 current predictions |
| L1 Instrumented | Data sources identified, collection method defined | Evidence ledger row exists with source field populated |
| L2 Tracked | Evidence flowing, conviction updating quarterly | >= 2 evidence entries, conviction updated at least once |
| L3 Tested | Prediction survived 2+ Bayesian updates with evidence | >= 2 conviction changes backed by data |
| L4 Resolved | Confirmed or falsified with timestamped evidence | Resolution entry with pass/fail and evidence chain |
State Transition Log
Timestamps on commissioning state changes enable velocity tracking — rate of L0 to L4 progression per month.
interface StateTransition {
featureId: string; // commissioning dashboard row
fromLevel: number; // 0-4
toLevel: number; // 0-4
timestamp: string; // ISO 8601
evidence: string; // what proved the transition
agent: string; // who/what triggered it
}
This feeds two signals: commissioning signal (#4) gets velocity data, and prediction #5 (small teams + agents) gets its own validation metric.
Signal Source Expansion
Internal signals (cost: $0, already collectible):
| Signal | Source | Validates Prediction | Collection |
|---|---|---|---|
| Agent receipt intent-match rate | .ai/receipts/ manual_interventions field | Intent Verification | Parse receipts directory |
| Sprout boot time | A2A acceptance test | Composable Primitives | Run and time test |
| ETL trust score completion | Trust scoring pipeline output | Trust Quantified | Count scored entities |
| Commissioning velocity | State transition log (new) | Small Teams + Agents | L0 to L4 transitions/month |
| Agent profile capability evidence | agent_profiles table | Identity Inversion | Count verified capabilities |
Market signals (cost: $0, automated scan):
| Signal | Source | Validates Prediction | Collection |
|---|---|---|---|
| Tokenized RWA total value | RWA.xyz | RWA Boring | Monthly scrape |
| On-chain identity adoption | Dune Analytics, Sui explorer | Identity Inversion, Trust Quantified | Monthly query |
| Agent commerce volume | a16z State of Crypto, Stripe reports | Intent Verification | Quarterly Perplexity scan |
| AI-native company revenue/employee | Crunchbase, public filings | Small Teams + Agents | Quarterly scan |
| Composable platform adoption | ProductHunt, YC batch analysis | Composable Primitives | Quarterly scan |
| NZ FMA tokenization guidance | FMA publications | RWA Boring | Quarterly check |
Bayesian Update Protocol
Automated trigger schedule per prediction category:
| Category | Weekly Check | Reassessment Trigger (>10% conviction shift) | Abandon Trigger |
|---|---|---|---|
| Intent Verification | Agent liability news | >$100M agent insurance products launched | Major jurisdiction bans agent transactions |
| Composable Primitives | YC batch composition | A2A/MCP reaches stable spec | Hyperscaler ships "Business in a Box," captures 80% |
| Trust Quantified | On-chain reputation TVL | Portable reputation standard proposed at W3C/SIP | Privacy backlash legislation in 3+ jurisdictions |
| Identity Inversion | "Portfolio hire" postings | On-chain professional identity >10M users | Deepfake receipts become trivial |
| Small Teams + Agents | Revenue-per-employee reports | >50 companies at <10 people, >$10M revenue | Agent reliability plateaus 18+ months |
| RWA Tokenization | RWA.xyz total value | NZ FMA issues formal guidance | Major jurisdiction criminalizes tokenized securities |
Core Algorithm
1. For each signal type with data:
a. Normalize raw measurement to 0-1 scale
b. Apply exponential decay: factor = exp(-0.693 * ageDays / halfLife)
c. Compute variance: (actual - forecast) / forecast
d. Record trend from last 3 measurements
2. Compute confidence:
confidence = signalsPresent / totalSignals * avgFreshness
3. Compute composite:
composite = sum(signal.score * signal.weight * signal.freshness)
/ sum(signal.weight * signal.freshness)
4. Classify:
>= 0.7 → on_track
>= 0.4 → warning
< 0.4 → critical
5. Generate recommendations:
topRisk = signal with lowest score
topMomentum = signal with best trend
actions = per-signal actionable suggestions
Protocols
How does the system coordinate?
Build Order
| Sprint | Features | What | Effort | Acceptance |
|---|---|---|---|---|
| S0 | #9, #10 | Constants + forecast baselines | 0.5d | Constants in file, baselines documented |
| S1 | #1, #2, #3 | Signal collectors + normalization + decay | 2d | Each collector returns normalized 0-1 with decay |
| S2 | #4, #5, #6, #7 | Variance + composite + classification + confidence | 2d | calculateNowcast() returns valid NowcastResult with 5 mock signals |
| S3 | #8 | Recommendations engine | 1d | topRisk, topMomentum, actions populated from signal breakdown |
| S4 | — | Wire to production data sources | 1d | Real signals flowing, composite rendered in Insights |
| S5 | #11, #12 | Prediction evidence ledger + state transition log | 2d | Evidence table accepts entries, transitions timestamped |
| S6 | #13 | Internal signal collectors | 1d | Receipt accuracy, sprout time, trust scores flowing to ledger |
| S7 | #14 | Market signal collectors (automated scans) | 2d | Monthly scan pipeline produces evidence entries for 6 predictions |
| S8 | #15, #16 | Bayesian triggers + resolution protocol | 1d | Stale predictions flagged, resolved predictions at L4 |
Commissioning
| # | Feature | Install | Test | Operational | Optimize |
|---|---|---|---|---|---|
| 1 | Signal Collectors (5x) | --- | --- | --- | --- |
| 2 | Signal Normalization | --- | --- | --- | --- |
| 3 | Exponential Decay | --- | --- | --- | --- |
| 4 | Variance Computation | --- | --- | --- | --- |
| 5 | Composite Scoring | --- | --- | --- | --- |
| 6 | Classification | --- | --- | --- | --- |
| 7 | Confidence Calculation | --- | --- | --- | --- |
| 8 | Recommendations Engine | --- | --- | --- | --- |
| 9 | Constants Registration | --- | --- | --- | --- |
| 10 | Forecast Baselines | --- | --- | --- | --- |
| 11 | Prediction Evidence Ledger | --- | --- | --- | --- |
| 12 | State Transition Log | --- | --- | --- | --- |
| 13 | Internal Signal Collectors | --- | --- | --- | --- |
| 14 | Market Signal Collectors | --- | --- | --- | --- |
| 15 | Bayesian Update Triggers | --- | --- | --- | --- |
| 16 | Resolution Protocol | --- | --- | --- | --- |
Agent-Facing Spec
Commands: pnpm test -- --filter=pipeline-nowcast, pnpm tc
Boundaries:
- Always: pure function, no DB writes, no side effects
- Ask first: threshold changes, weight adjustments
- Never: modify signal source data, bypass confidence check
Test Contract:
| # | Feature | Test File | Assertion |
|---|---|---|---|
| 1 | Signal collectors | pipeline-nowcast.test.ts | Each collector returns 0-1 from valid input |
| 2 | Normalization | pipeline-nowcast.test.ts | Edge cases: zero, negative, very large values |
| 3 | Exponential decay | pipeline-nowcast.test.ts | 14-day-old signal at 50% weight |
| 4 | Composite scoring | pipeline-nowcast.test.ts | 5 signals produce weighted sum |
| 5 | Classification | pipeline-nowcast.test.ts | Boundary: 0.7 on_track, 0.4 warning |
| 6 | Confidence | pipeline-nowcast.test.ts | 2 of 5 signals = confidence < 0.5 |
| 7 | Recommendations | pipeline-nowcast.test.ts | topRisk = lowest-scoring signal |
| 8 | Cold start | pipeline-nowcast.test.ts | 0 signals = confidence 0, status critical |
| 9 | Evidence ledger | prediction-evidence.test.ts | Evidence entry updates prediction maturity |
| 10 | State transitions | prediction-evidence.test.ts | Transition timestamps enable velocity calc |
| 11 | Bayesian triggers | prediction-evidence.test.ts | >10% conviction shift flags reassessment |
| 12 | Resolution | prediction-evidence.test.ts | Resolved prediction reaches L4 with evidence |
| 13 | Market collector | prediction-evidence.test.ts | Scan returns structured evidence from source |
| 14 | Internal collector | prediction-evidence.test.ts | Receipt parse returns intent-match rate |
Players
Who creates harmony?
Job 1: Know If We're On Track
| Element | Detail |
|---|---|
| Struggling moment | Weekly commissioning session: 45 min checking 5 dashboards, forming a mental picture |
| Workaround | Manual synthesis, gut feel, "seems fine" until it isn't |
| Progress | Glance at one composite score, see which signal is drifting, act on the recommendation |
| Hidden objection | "A single number can't capture this complexity" |
| Switch trigger | Missed a regression that was visible in the data 3 days earlier |
Features that serve this job: #5, #6, #8
Job 2: Detect Drift Early
| Element | Detail |
|---|---|
| Struggling moment | Problem compounds silently between reviews. Activity drops, nobody notices for a week. |
| Workaround | Hope someone checks. Rely on agent posting to #meta. |
| Progress | Nowcast fires warning when activity velocity drops below threshold, before the weekly review |
| Hidden objection | "False alarms are worse than no alarms" |
| Switch trigger | A deal went cold because nobody noticed zero activity for 10 days |
Features that serve this job: #3, #4, #7
Job 3: Validate Predictions With Evidence
| Element | Detail |
|---|---|
| Struggling moment | 76 predictions in a markdown table. Conviction scores assigned once, never updated. No evidence. |
| Workaround | Quarterly manual review, gut-feel conviction updates, no data to support or falsify. |
| Progress | Evidence ledger collects signals. Bayesian triggers flag stale predictions. Maturity model tracks. |
| Hidden objection | "Predictions are inherently uncertain — instrumenting them is false precision" |
| Switch trigger | A prediction you scored 4/5 turns out wrong and you never collected the evidence that would've told you 6 months earlier |
Features that serve this job: #11, #13, #14, #15, #16
Relationship to Other PRDs
| PRD | Relationship | Data Flow |
|---|---|---|
| Sales CRM & RFP | Peer | Pipeline + activity signals flow IN to nowcast |
| Agent Platform | Peer | Agent velocity signals flow IN from Convex. Receipt accuracy feeds prediction evidence. |
| ETL Data Tool | Peer (upstream) | Trust scoring completion feeds prediction evidence. Market signal collection reuses ETL patterns. |
| Sales Process Optimisation | Peer (downstream) | Nowcast output could feed SPO decision engine |
| Prediction Database | Data source | 76 predictions flow IN. Evidence flows back. Maturity state tracked. |
| Sui Real Estate Tokenization | Evidence subject | RWA TVL market signal validates tokenization prediction |
Context
- Pictures — Pre-flight maps that feed this spec
- Prompt Deck — Sales compression of this spec
- Commissioning Dashboard — The scoreboard this algorithm reads
- AI Product Requirements — Section definitions