← Multimodal Agent Interface · Prompt Deck · Pictures · Plan

Multimodal Agent Interface Spec

How do we make conversation the default and forms the escape hatch?

Intent Contract

Agent autonomy boundary. Not parsed by engineering — governance instrument for humans and agents.

Dimension	Statement
Objective	Conversational AI as primary drmg-sales interface — users talk to the agent (voice, text, document drop) instead of navigating forms. Every WorkChart accessible through conversation.
Outcomes	1. Agent processes an RFP from PDF drop to proposal draft in <5 minutes. 2. Pipeline summary available via "What's my pipeline?" in <3 seconds. 3. 50%+ of completed tasks originate from conversation within 60 days.
Health Metrics	Existing form-based workflows remain functional. CRM data integrity (no orphaned records from conversational mutations). Page load <2s on all routes.
Constraints	Hard: Multi-tenant isolation (conversation context never leaks across orgs). Hard: Agent composes, human sends (no autonomous external communication). Steering: Text-first MVP, voice in T1.
Autonomy	Allowed: UI layout, streaming strategy, model selection per modality. Escalate: Schema changes, new entity types, external API integrations. Never: Delete user data, send external communications, modify billing.
Stop Rules	Complete when: all Build Contract rows at Live + 1 real RFP processed end-to-end via conversation. Halt when: <10% conversational task rate after 30 days, or agent hallucinates CRM data in 3+ incidents.
Counter-metrics	Form-based task completion rate must not drop below current baseline. CRM query accuracy must stay >95%. WorkChart execution time must not increase >20% vs direct invocation.
Blast Radius	All drmg-sales users. All WorkChart orchestrators. Agent registry. Session storage. Streaming infrastructure.
Rollback	Feature flag per org. Disable conversational interface, revert to form-only navigation. Session data retained but inaccessible until re-enabled.

Story Contract

Stories are test contracts. Each row = 1+ test file. Tests must be RED before implementation starts. Tests going GREEN = value delivered.

#	WHEN (Trigger + Precondition)	THEN (Exact Assertion)	ARTIFACT (Test File)	Test Type	FORBIDDEN (Must not happen)	OUTCOME (Value Proven)
S1	User drops PDF into chat AND venture exists in org with 0 RFP questions	`rfp_questions` table has >=5 rows WHERE `venture_id` = target AND `source` = 'pdf_extraction' within 60s	BLOCKER: `apps/drmg-sales-e2e/src/e2e/chat-rfp.spec.ts`	e2e	Agent fabricates questions not present in the PDF. `rfp_questions.source` = 'hallucinated' appears in DB.	RFP extraction in <60s vs 30min manual copy-paste. Zero missed appendix questions.
S1b	S1 complete AND answer library has >=3 approved answers	`rfp_answers` table has >=1 row WHERE `source` = 'library_match' AND `confidence` > 0.8	BLOCKER: `apps/drmg-sales-e2e/src/e2e/chat-rfp.spec.ts`	e2e	Agent auto-fills with answers from a different org's library. `org_id` mismatch between answer and venture.	Auto-fill compounds — second RFP is faster than first.
S2	User sends "What's my pipeline?" AND org has >=3 deals across 2+ stages	Response contains JSON with `deals[]` where each has `name`, `value`, `stage` AND `response_time_ms` < 3000	BLOCKER: `apps/drmg-sales-e2e/src/e2e/chat-crm-query.spec.ts`	e2e	Response includes deals WHERE `org_id` != user's org. Deal values differ from `agent_profile_deals.value` source of truth.	Pipeline summary in <3s vs 4-module navigation (~5min).
S3	User sends "Draft outreach for Acme Corp" AND contact exists with `company_name` = 'Acme Corp'	Agent streams draft with `contact.name` and `company.industry` present in output AND draft is NOT sent externally	BLOCKER: `apps/drmg-sales-e2e/src/e2e/chat-outreach.spec.ts`	e2e	Agent sends email/message to external recipient without human clicking "Send". Any `outbound_messages` row created without `status` = 'draft'.	Personalised outreach drafted in <30s vs 10min manual research + compose.
S4	User sends follow-up "Make it more formal" AND session has >=2 prior messages including S3 output	Agent returns modified draft referencing prior context. `session.messages.length` >= 4. No re-invocation of outreach WorkChart.	BLOCKER: `libs/agency/src/__tests__/session-memory.spec.ts`	integration	Agent responds with "I don't have context" or re-asks for the company name. Session context array is empty on follow-up.	Multi-turn context preserved across 10+ messages. Zero repeated questions.
S5	User uploads image (PNG/JPG) AND image contains ambiguous content (not clearly an RFP)	Agent responds with classification question before routing. `modality_router.classification` = 'ambiguous'. No WorkChart invoked.	BLOCKER: `libs/agency/src/__tests__/modality-router.spec.ts`	integration	Agent routes to sales-rfp-workflow and produces irrelevant proposal from a whiteboard photo. WorkChart invoked without user confirmation.	Ambiguous inputs classified in <2s. Zero misrouted WorkCharts on ambiguous input.

Build Contract

The deliverable. Engineering builds from this. Commissioning reads this. Every row has an acceptance test. FAVV v2.1 format.

Job 1: Talk to the Agent

#	FeatureID	Function	Artifact	Success Test	Safety Test	Regression Test	Value	State
1	AI-021	Send text message, receive streaming response	Chat UI with `useChat` hook + streaming endpoint	Message sends, response streams token-by-token, complete response in <5s	Response never includes data from another org (S2 FORBIDDEN)	Existing API routes respond in <500ms	Converse with the agent like a colleague	Gap
2	AI-020	Upload file (PDF, DOCX, image) via chat	File upload component + multimodal input handler	File uploads, progress shown, content extracted and summarised in <10s	File content never persisted in plain text outside org tenant (S1 FORBIDDEN)	File upload on venture pages still works	Drop a document and get answers	Gap
3	AI-020	Voice input via browser microphone	STT integration (Whisper/Deepgram) + mic button	Tap mic, speak, transcription appears as message, agent responds	Audio never stored beyond transcription session	— (new capability)	Talk instead of type	Gap
4	AI-021	Multi-turn session memory within conversation	Session context store (per-org, per-user)	Follow-up references prior turns correctly across 10+ messages (S4)	Session data never leaks between users or orgs (S4 FORBIDDEN)	— (new capability)	Agent remembers what you said five minutes ago	Gap

Job 2: Agent Routes to the Right WorkChart

#	FeatureID	Function	Artifact	Success Test	Safety Test	Regression Test	Value	State
5	AI-022	Detect input modality (text, file, voice)	Modality classifier module	Correctly classifies 95%+ of inputs across text, PDF, image, audio	Never routes ambiguous input without asking (S5 FORBIDDEN)	— (new capability)	Agent knows what you gave it	Gap
6	AI-022	Route classified input to correct WorkChart	Skill router extension + WorkChart adapter	"Process this RFP" routes to sales-rfp-workflow. "Draft outreach" routes to outreach.	Never executes a WorkChart the user didn't intend (S5 FORBIDDEN)	Existing skill router matches still resolve correctly	Say what you need, agent picks the right tool	Gap
7	AI-020	Normalise multimodal input to WorkChart shape	Input normaliser (PDF-to-text, image-to-text, STT)	PDF extracted to structured text matching WorkChart input schema	Normalisation never truncates content silently (S1 FORBIDDEN)	— (new capability)	Any input format works with any WorkChart	Gap

Job 3: See Progress and Results

#	FeatureID	Function	Artifact	Success Test	Safety Test	Regression Test	Value	State
8	AI-008	Stream WorkChart execution progress	Streaming UI for long-running orchestrations	User sees step-by-step progress during RFP processing (not just a spinner)	Progress updates never expose internal system state	— (new capability)	Know what the agent is doing right now	Gap
9	AI-021	Query CRM data through conversation	Natural language to CRM query adapter	"Show deals closing this month" returns correct filtered deal list (S2)	Query never returns data outside user's org (S2 FORBIDDEN)	CRM list pages still load in <1s	Ask a question, get the answer	Gap
10	AI-021	Display structured results inline in chat	Rich message components (tables, cards, charts)	Pipeline summary renders as table with amounts and stages, not plain text	Rendered data matches source of truth exactly (S2 FORBIDDEN)	— (new capability)	Results look like results, not walls of text	Gap

Cross-Cutting

#	FeatureID	Function	Artifact	Success Test	Safety Test	Regression Test	Value	State
11	WORK-001	Feature flag per org for conversational mode	Feature flag in org settings + middleware	Toggle on: chat widget appears. Toggle off: forms-only experience.	Flag change never drops in-flight conversations	All form-based flows work identically when flag off	Ship incrementally, rollback safely	Gap
12	AI-021	Conversation history and audit trail	Conversation log store (per-org, searchable)	User can review past conversations. Admin can audit agent actions.	Conversation logs never accessible cross-org (S2 FORBIDDEN)	— (new capability)	Know what the agent did and why	Gap

Screen Contracts

Screen: Chat Interface (`/chat`)

Serves: S1-S5, Build Contract #1-4, #8-10

Flow and States

Dimension	Spec
Route	`/chat`
Entry from	Sidebar "Chat" link, or floating widget button on any page
Success	Message appears in thread, agent response streams below
Error	Error toast "Agent unavailable — try again" + message stays in input
Auth denied	Redirect to `/sign-in`
Loading	Skeleton message bubbles (3 lines) while conversation history loads
Empty	"Start a conversation. Drop a file, ask a question, or say what you need." + 3 suggested prompts
Disabled	Send button disabled during agent response streaming. Mic button disabled during transcription.

Elements

Element	Selector	States
Page heading	`role="heading", name="Chat"`	—
Message input	`label="Message"`	empty, filled, disabled
Send button	`role="button", name="Send"`	enabled, disabled, loading
Mic button	`role="button", name="Voice input"`	idle, recording, transcribing
File upload	`role="button", name="Upload file"`	idle, uploading, processing
Message thread	`testid="message-thread"`	empty, loading, populated
Agent response	`testid="agent-response-{id}"`	streaming, complete, error
Progress display	`role="status"`	hidden, streaming-steps, complete
Feedback	`role="status"`	success toast, error toast

How does the user move from pain to effortless performance?

The Screen Contract above defines WHAT /chat contains. This section defines WHERE it lives in the app, HOW users discover it, and WHY the information architecture changes.

Pain-to-Perform Journey

Every navigation decision maps to a stage in the user's journey from friction to flow. The UI team builds the navigation that carries the user through each stage — not just the destination screen.

Stage	User State	Navigation Must Do	Current State	Target State
Pain	"I navigate 4 modules to assemble one answer"	Show the cost of the current path	6 sidebar sections, no unifying entry	Same — this is what the user sees before chat
Awareness	"There's a faster way"	Surface the conversational entry without disrupting existing flow	No chat entry point exists	Chat icon in sidebar + floating widget pulse on first login
First Value	"That was faster than clicking through forms"	Deliver one complete answer in <5 seconds from first message	N/A	Suggested prompts on empty state: "What's my pipeline?", "Draft outreach for...", "Upload an RFP"
Habit	"I go to chat first now"	Make chat the natural starting point	Dashboard shows 6 section cards	Chat becomes first sidebar item. Dashboard shows recent conversations above section cards
Mastery	"I do everything from chat"	Support power-user shortcuts without removing form access	Forms are the only path	Chat responses include deep-links to form views. Form pages include "Ask about this" chat triggers
Effortless	"The agent anticipates what I need"	Proactive suggestions based on context	N/A (parked)	Agent suggests next action on login. Context-aware prompts per section

State	Sidebar	Dashboard	Section Pages	Widget
Flag OFF	Proposals, Pipeline, Insights, Plans, Agents, Settings	6 section cards + 4 quick actions	No chat triggers	Hidden
Flag ON (T0)	Chat (first item), Proposals, Pipeline, Insights, Plans, Agents, Settings	Recent conversations card above section cards. "Ask anything" prompt	No cross-links yet	Floating chat button (bottom-right), collapsed by default
Flag ON (T2+)	Same as T0	Same as T0	Each section gets "Ask about this" contextual chat trigger	Same, opens with section-aware suggested prompts

Chat is not a module — it is a layer across all modules. The navigation must reflect this.

From	To	Trigger	What Happens
Any page	Chat (full)	Click sidebar "Chat" or floating widget	Navigate to `/chat`. If widget was open with context, carry conversation forward
Any page	Chat (widget)	Click floating button	Widget expands in-place. User stays on current page. Agent has page context
Chat	Pipeline	Agent shows deal data, user clicks "View in Pipeline"	Navigate to `/pipeline/deals/{id}`. Chat widget stays accessible
Chat	Proposals	Agent shows RFP draft, user clicks "Open in Proposals"	Navigate to `/proposals/ventures/{id}`. Draft pre-populated
Pipeline	Chat	"Ask about this deal" button on deal card	Widget opens with deal context pre-loaded: "Tell me about `deal.name`"
Proposals	Chat	"Process with AI" button on RFP page	Widget opens with venture context: "I have questions about `venture.name`"

[Chat]          ← NEW: first item, conversation icon
─────────────── ← visual separator
Proposals       ← existing
Pipeline        ← existing
Insights        ← existing
Plans           ← existing
Agents          ← existing
───────────────
Settings        ← existing, bottom-anchored

Chat goes first because the PRD's own principle states: "The agent IS the product. Forms are the escape hatch." If chat is buried below existing sections, the navigation contradicts the intent.

Wiring Coordinates

Element	File	Action
Sidebar nav items	BLOCKER: locate sidebar component in `app/(app)/` layout	Add "Chat" as first `NavItem` with conversation icon
Floating widget	BLOCKER: `libs/ui/src/components/chat/ChatWidget.tsx`	New component. Renders on all `(app)` pages when feature flag ON
Dashboard card	BLOCKER: `app/(app)/dashboard/page.tsx`	Add "Recent Conversations" card above section grid
Section chat triggers	BLOCKER: per-section page files	Add "Ask about this" button to Pipeline deal cards and Proposals venture pages
Feature flag middleware	BLOCKER: depends on `org_feature_flags` table (PR #407 schema)	Conditionally render chat sidebar item + widget

Principles

What truths constrain the design?

The Job

Element	Detail
Situation	A sales rep's day: check pipeline, respond to RFPs, research prospects, update deals, draft outreach. Each task means navigating to a different module, filling forms.
Intention	The rep opens drmg-sales and talks to the agent. Every workflow accessible through conversation. The agent serves the user, not the database.
Obstacle	No conversational interface exists. WorkCharts accept string-only inputs. No multimodal input normalisation. No session memory. No streaming UI for WorkCharts.
Hardest Thing	Making the agent feel like a colleague who knows your pipeline, your style, your history — not a chatbot that asks you to rephrase.

The hidden objection: "I'll spend more time explaining what I want to the AI than it would take to just click through the forms." The answer: context + initiative. The agent remembers your pipeline, suggests the next action, and only asks when genuinely ambiguous.

Why Now

8 WorkCharts x 0 conversational entry = 0 agent value to users
1x1 modality coverage (text-to-text) out of 49 possible (7x7)
Vercel AI SDK (@ai-sdk/react, useChat) in stack but unused in UI
GPT-4o voice, Claude voice, Gemini Live = multimodal input feasible now
HubSpot AI, Salesforce Einstein, Clay = competitors shipping conversational CRM
Without this, drmg-sales is a form-based SaaS in an agent-first world

Design Constraints

Constraint	Rationale
Conversation-first, forms-second	The agent IS the product. Forms are the escape hatch for power users who want direct manipulation.
Text MVP, voice in T1	Text is the simplest input to normalise and test. Voice adds STT complexity. Ship text, prove value, add voice.
Agent composes, human sends	No autonomous external communication. Agent drafts outreach, user reviews and sends.
Multi-tenant isolation	Conversation context, session memory, and CRM queries scoped to org. Never cross boundaries.
Feature-flagged per org	Incremental rollout. Some orgs get conversation first. All orgs keep forms.

Performance

How do we know it's working?

Priority Score

PRIORITY = Pain x Demand x Edge x Trend x Conversion

Dimension	Score (1-5)	Evidence
Pain	5	Every workflow requires navigation + form-fill. 8 WorkCharts with no user-facing conversational entry.
Demand	4	HubSpot AI, Salesforce Einstein, Clay shipping conversational CRM. Not validated with our users.
Edge	3	Agency lib WorkCharts + modalities knowledge + skill routing. No proprietary data or network effect.
Trend	5	Omnimodal models make multimodal input table stakes. Every SaaS tool gets a conversational layer.
Conversion	2	AI feature pricing not validated. Bundled or premium tier? No pilot customer.
Composite	600	5 x 4 x 3 x 5 x 2

North Star: Conversational task completion > 50% (tasks completed via conversation / total tasks).

Quality Targets

Metric	Target	Now
Conversational task rate	>50%	0%
Agent response time (text)	<3s	N/A
Modality classification accuracy	>95%	N/A
Session context retention	10+ turns	N/A
RFP processing time (PDF drop)	<5 minutes	N/A

Failure Budget

Failure Type	Budget	Response
Cross-org data leak	0%	Immediate halt — trust destroyed
Agent hallucination (CRM)	<2%	Flag, log, investigate — wrong data costs deals
Misrouted WorkChart	<5%	Improve classifier — user wastes time, not trust
Dropped session context	<5%	Fix session store — annoying but recoverable

Kill signal: Chat widget ships but <10% of tasks go through conversation after 30 days. Users bypass agent and navigate directly to forms.

Platform

What do we control?

Current State

Component	Built	Wired	Working	Notes
WorkChart orchestrators (8)	Yes	Yes	Yes	No conversational entry point
Skill router	Yes	Yes	Yes	Text-only, no modality detection
Vercel AI SDK (`useChat`)	Yes	No	No	In `package.json`, unused in UI
Agent registry	Yes	Partial	Partial	Agent Platform L2
Session/memory store	No	No	No	Agent Platform dependency
STT integration	No	No	No	Whisper/Deepgram not yet integrated
Multimodal input normaliser	No	No	No	PDF/image/voice to text pipeline missing
Streaming UI	No	No	No	No progress display for WorkChart execution
Chat UI component	No	No	No	No conversational interface exists

Build Ratio

~40% composition (Vercel AI SDK, WorkCharts, skill router, agent registry), ~60% new code (chat UI, modality router, input normaliser, session memory, streaming progress).

Protocols

How do we coordinate?

Build Order

Sprint	Features	What	Effort	Acceptance
T0	#1, #4, #11	Text chat UI + session memory + feature flag	5 days	User sends text, agent responds with streaming. Context across turns.
T1	#5, #6, #7	Modality router + WorkChart adapter + normaliser	5 days	"Process this RFP" routes to sales-rfp-workflow correctly.
T2	#2, #8, #10	File upload + streaming progress + rich results	5 days	PDF drop extracts content, streams progress, renders structured output.
T3	#9, #12	CRM query adapter + conversation history	4 days	"What's my pipeline?" returns correct deal summary.
T4	#3	Voice input (STT)	3 days	Tap mic, speak, transcription becomes message.
Park	—	TTS responses, proactive agent, advanced analytics	—	Only after T0-T3 prove conversational value.

Total: ~22 days T0-T4. Kill date: 2026-05-01.

Commissioning

#	Feature group	Install	Test	Operational	Optimize
1, 4	Text chat + session memory	—	—	—	—
5-7	Modality router + normaliser	—	—	—	—
2, 8	File upload + streaming progress	—	—	—	—
9-10	CRM query + rich results	—	—	—	—
3	Voice input	—	—	—	—
11-12	Feature flag + conversation history	—	—	—	—

Agent-Facing Spec

Commands:

pnpm nx serve stackmates        # Dev server
pnpm nx test stackmates          # Unit tests
pnpm nx e2e stackmates-e2e       # E2E tests

Boundaries: Always: UI layout, streaming strategy, model selection. Ask first: schema changes, new entity types, external API keys. Never: delete user data, send external communications, modify billing.

Players

Who creates harmony?

Demand-Side Jobs

Situation: Sales rep needs pipeline status before a meeting in 10 minutes. Currently: open app, navigate to pipeline page, scan kanban, filter by date, mentally summarise.

Element	Detail
Struggling moment	Navigating through 4 modules to assemble context that should be one question away
Current workaround	Open 3-4 tabs, manually cross-reference deals with contacts with ventures
What progress looks like	Ask "What needs attention today?" and get a prioritised action list in 3 seconds
Hidden objection	"I'll spend more time explaining to the AI than clicking through forms"
Switch trigger	Missed a follow-up because the deal was buried in a pipeline view nobody checks daily

Features that serve this job: #1, #4, #9, #10

Job 2: Process Documents Without Copy-Paste

Situation: RFP PDF arrives. Currently: download, open in viewer, copy-paste questions into textarea one by one, wait for AI to generate answers, manually review.

Element	Detail
Struggling moment	Copy-pasting from PDF to web form, losing formatting, missing questions buried in appendices
Current workaround	Manual extraction into spreadsheet, then paste into venture Q&A form
What progress looks like	Drop PDF in chat, agent extracts all questions, streams draft answers, asks for review
Hidden objection	"AI will miss context in the document and generate wrong answers I'll have to fix anyway"
Switch trigger	Lost a bid because a question in a 60-page appendix was missed during manual extraction

Features that serve this job: #2, #5, #6, #7, #8

Job 3: Draft Outreach Without Context-Switching

Situation: Three prospects need personalised outreach. Currently: open Sales Dev, navigate to prospect, review profile, click generate, wait, review, repeat x3.

Element	Detail
Struggling moment	Context-switching between prospect research and draft generation across modules
Current workaround	Write outreach in email client, occasionally reference CRM data in another tab
What progress looks like	"Draft outreach for these 3 prospects" generates personalised drafts in one conversation
Hidden objection	"AI outreach is generic. I'll have to rewrite it anyway."
Switch trigger	Competitor's AI assistant drafts outreach that references specific company news automatically

Features that serve this job: #1, #4, #6, #9

ICP: SMB Sales Team Without IT

Attribute	Specification
Role	Sales rep, BD manager, or sales director at SMB without dedicated IT resource
Context	5-50 employees, using drmg-sales for CRM + RFP, comfortable with chat interfaces
Geography	New Zealand initially, English-speaking markets
Budget	Already paying for drmg-sales. AI features as bundled or $10-20/seat premium tier.

Psycho-logic: "We don't need AI" means "we tried a chatbot that couldn't find our data." The stated objection is AI skepticism. The real objection is broken promises from previous AI tools. The unlock: the agent knows your pipeline because it shares the database. First interaction must demonstrate real CRM awareness, not generic responses.

Role Definitions

Role	Access	Permissions
Admin	Chat + all CRM data, org-wide	Configure feature flag, view conversation audit, manage agent
Sales Rep	Chat + own org CRM data	Converse, upload files, query CRM, trigger WorkCharts
Viewer	Chat + read-only CRM data	Ask questions, view results. No mutations via conversation.

Relationship to Other PRDs

PRD	Relationship	Data Flow
Agent Platform	Platform (depends)	Identity, memory, comms, dispatch — infrastructure
Identity & Access	Platform (depends)	Auth, roles, permissions — required for any UI
Sales CRM & RFP	Peer (layer above)	CRM data model + form features. This = access method.
Sales Dev Agent	Peer (consumer)	SDR agent logic would use this conversation surface

Context

PRD Index — Multimodal Agent Interface
Prompt Deck — 5-card pitch
Pictures — Pre-flight maps
Plan — Implementation coordinates

Questions

What happens when the agent knows your pipeline better than you do — and acts on it?

If S1 FORBIDDEN fires (hallucinated RFP questions), does the user lose trust in the agent permanently or just for that session?
Which Story Contract row will be hardest to make GREEN — and does that story block the others?
When the modality router misclassifies (S5), is the cost a wasted WorkChart run or a wrong proposal sent to a client?

Intent Contract​

Story Contract​

Build Contract​

Job 1: Talk to the Agent​

Job 2: Agent Routes to the Right WorkChart​

Job 3: See Progress and Results​

Cross-Cutting​

Screen Contracts​

Screen: Chat Interface (/chat)​

Flow and States​

Elements​

Navigation Architecture​

Pain-to-Perform Journey​

Navigation States (Feature Flag Controlled)​

Cross-Cutting Navigation​

Sidebar Hierarchy​

Wiring Coordinates​

Principles​

The Job​

Why Now​

Design Constraints​

Performance​

Priority Score​

Quality Targets​

Failure Budget​

Platform​

Current State​

Build Ratio​

Protocols​

Build Order​

Commissioning​

Agent-Facing Spec​

Players​

Demand-Side Jobs​

Job 1: Get Answers Without Navigation​

Job 2: Process Documents Without Copy-Paste​

Job 3: Draft Outreach Without Context-Switching​

ICP: SMB Sales Team Without IT​

Role Definitions​

Relationship to Other PRDs​

Context​

Questions​