Skip to main content

← Multimodal Agent Interface · Prompt Deck · Pictures · Plan

Multimodal Agent Interface Spec

How do we make conversation the default and forms the escape hatch?

Intent Contract

Agent autonomy boundary. Not parsed by engineering — governance instrument for humans and agents.

DimensionStatement
ObjectiveConversational AI as primary drmg-sales interface — users talk to the agent (voice, text, document drop) instead of navigating forms. Every WorkChart accessible through conversation.
Outcomes1. Agent processes an RFP from PDF drop to proposal draft in <5 minutes. 2. Pipeline summary available via "What's my pipeline?" in <3 seconds. 3. 50%+ of completed tasks originate from conversation within 60 days.
Health MetricsExisting form-based workflows remain functional. CRM data integrity (no orphaned records from conversational mutations). Page load <2s on all routes.
ConstraintsHard: Multi-tenant isolation (conversation context never leaks across orgs). Hard: Agent composes, human sends (no autonomous external communication). Steering: Text-first MVP, voice in T1.
AutonomyAllowed: UI layout, streaming strategy, model selection per modality. Escalate: Schema changes, new entity types, external API integrations. Never: Delete user data, send external communications, modify billing.
Stop RulesComplete when: all Build Contract rows at Live + 1 real RFP processed end-to-end via conversation. Halt when: <10% conversational task rate after 30 days, or agent hallucinates CRM data in 3+ incidents.
Counter-metricsForm-based task completion rate must not drop below current baseline. CRM query accuracy must stay >95%. WorkChart execution time must not increase >20% vs direct invocation.
Blast RadiusAll drmg-sales users. All WorkChart orchestrators. Agent registry. Session storage. Streaming infrastructure.
RollbackFeature flag per org. Disable conversational interface, revert to form-only navigation. Session data retained but inaccessible until re-enabled.

Story Contract

Stories are test contracts. Each row = 1+ test file. Tests must be RED before implementation starts. Tests going GREEN = value delivered.

#WHEN (Trigger + Precondition)THEN (Exact Assertion)ARTIFACT (Test File)Test TypeFORBIDDEN (Must not happen)OUTCOME (Value Proven)
S1User drops PDF into chat AND venture exists in org with 0 RFP questionsrfp_questions table has >=5 rows WHERE venture_id = target AND source = 'pdf_extraction' within 60sBLOCKER: apps/drmg-sales-e2e/src/e2e/chat-rfp.spec.tse2eAgent fabricates questions not present in the PDF. rfp_questions.source = 'hallucinated' appears in DB.RFP extraction in <60s vs 30min manual copy-paste. Zero missed appendix questions.
S1bS1 complete AND answer library has >=3 approved answersrfp_answers table has >=1 row WHERE source = 'library_match' AND confidence > 0.8BLOCKER: apps/drmg-sales-e2e/src/e2e/chat-rfp.spec.tse2eAgent auto-fills with answers from a different org's library. org_id mismatch between answer and venture.Auto-fill compounds — second RFP is faster than first.
S2User sends "What's my pipeline?" AND org has >=3 deals across 2+ stagesResponse contains JSON with deals[] where each has name, value, stage AND response_time_ms < 3000BLOCKER: apps/drmg-sales-e2e/src/e2e/chat-crm-query.spec.tse2eResponse includes deals WHERE org_id != user's org. Deal values differ from agent_profile_deals.value source of truth.Pipeline summary in <3s vs 4-module navigation (~5min).
S3User sends "Draft outreach for Acme Corp" AND contact exists with company_name = 'Acme Corp'Agent streams draft with contact.name and company.industry present in output AND draft is NOT sent externallyBLOCKER: apps/drmg-sales-e2e/src/e2e/chat-outreach.spec.tse2eAgent sends email/message to external recipient without human clicking "Send". Any outbound_messages row created without status = 'draft'.Personalised outreach drafted in <30s vs 10min manual research + compose.
S4User sends follow-up "Make it more formal" AND session has >=2 prior messages including S3 outputAgent returns modified draft referencing prior context. session.messages.length >= 4. No re-invocation of outreach WorkChart.BLOCKER: libs/agency/src/__tests__/session-memory.spec.tsintegrationAgent responds with "I don't have context" or re-asks for the company name. Session context array is empty on follow-up.Multi-turn context preserved across 10+ messages. Zero repeated questions.
S5User uploads image (PNG/JPG) AND image contains ambiguous content (not clearly an RFP)Agent responds with classification question before routing. modality_router.classification = 'ambiguous'. No WorkChart invoked.BLOCKER: libs/agency/src/__tests__/modality-router.spec.tsintegrationAgent routes to sales-rfp-workflow and produces irrelevant proposal from a whiteboard photo. WorkChart invoked without user confirmation.Ambiguous inputs classified in <2s. Zero misrouted WorkCharts on ambiguous input.

Build Contract

The deliverable. Engineering builds from this. Commissioning reads this. Every row has an acceptance test. FAVV v2.1 format.

Job 1: Talk to the Agent

#FeatureIDFunctionArtifactSuccess TestSafety TestRegression TestValueState
1AI-021Send text message, receive streaming responseChat UI with useChat hook + streaming endpointMessage sends, response streams token-by-token, complete response in <5sResponse never includes data from another org (S2 FORBIDDEN)Existing API routes respond in <500msConverse with the agent like a colleagueGap
2AI-020Upload file (PDF, DOCX, image) via chatFile upload component + multimodal input handlerFile uploads, progress shown, content extracted and summarised in <10sFile content never persisted in plain text outside org tenant (S1 FORBIDDEN)File upload on venture pages still worksDrop a document and get answersGap
3AI-020Voice input via browser microphoneSTT integration (Whisper/Deepgram) + mic buttonTap mic, speak, transcription appears as message, agent respondsAudio never stored beyond transcription session— (new capability)Talk instead of typeGap
4AI-021Multi-turn session memory within conversationSession context store (per-org, per-user)Follow-up references prior turns correctly across 10+ messages (S4)Session data never leaks between users or orgs (S4 FORBIDDEN)— (new capability)Agent remembers what you said five minutes agoGap

Job 2: Agent Routes to the Right WorkChart

#FeatureIDFunctionArtifactSuccess TestSafety TestRegression TestValueState
5AI-022Detect input modality (text, file, voice)Modality classifier moduleCorrectly classifies 95%+ of inputs across text, PDF, image, audioNever routes ambiguous input without asking (S5 FORBIDDEN)— (new capability)Agent knows what you gave itGap
6AI-022Route classified input to correct WorkChartSkill router extension + WorkChart adapter"Process this RFP" routes to sales-rfp-workflow. "Draft outreach" routes to outreach.Never executes a WorkChart the user didn't intend (S5 FORBIDDEN)Existing skill router matches still resolve correctlySay what you need, agent picks the right toolGap
7AI-020Normalise multimodal input to WorkChart shapeInput normaliser (PDF-to-text, image-to-text, STT)PDF extracted to structured text matching WorkChart input schemaNormalisation never truncates content silently (S1 FORBIDDEN)— (new capability)Any input format works with any WorkChartGap

Job 3: See Progress and Results

#FeatureIDFunctionArtifactSuccess TestSafety TestRegression TestValueState
8AI-008Stream WorkChart execution progressStreaming UI for long-running orchestrationsUser sees step-by-step progress during RFP processing (not just a spinner)Progress updates never expose internal system state— (new capability)Know what the agent is doing right nowGap
9AI-021Query CRM data through conversationNatural language to CRM query adapter"Show deals closing this month" returns correct filtered deal list (S2)Query never returns data outside user's org (S2 FORBIDDEN)CRM list pages still load in <1sAsk a question, get the answerGap
10AI-021Display structured results inline in chatRich message components (tables, cards, charts)Pipeline summary renders as table with amounts and stages, not plain textRendered data matches source of truth exactly (S2 FORBIDDEN)— (new capability)Results look like results, not walls of textGap

Cross-Cutting

#FeatureIDFunctionArtifactSuccess TestSafety TestRegression TestValueState
11WORK-001Feature flag per org for conversational modeFeature flag in org settings + middlewareToggle on: chat widget appears. Toggle off: forms-only experience.Flag change never drops in-flight conversationsAll form-based flows work identically when flag offShip incrementally, rollback safelyGap
12AI-021Conversation history and audit trailConversation log store (per-org, searchable)User can review past conversations. Admin can audit agent actions.Conversation logs never accessible cross-org (S2 FORBIDDEN)— (new capability)Know what the agent did and whyGap

Screen Contracts

Screen: Chat Interface (/chat)

Serves: S1-S5, Build Contract #1-4, #8-10

Flow and States

DimensionSpec
Route/chat
Entry fromSidebar "Chat" link, or floating widget button on any page
SuccessMessage appears in thread, agent response streams below
ErrorError toast "Agent unavailable — try again" + message stays in input
Auth deniedRedirect to /sign-in
LoadingSkeleton message bubbles (3 lines) while conversation history loads
Empty"Start a conversation. Drop a file, ask a question, or say what you need." + 3 suggested prompts
DisabledSend button disabled during agent response streaming. Mic button disabled during transcription.

Elements

ElementSelectorStates
Page headingrole="heading", name="Chat"
Message inputlabel="Message"empty, filled, disabled
Send buttonrole="button", name="Send"enabled, disabled, loading
Mic buttonrole="button", name="Voice input"idle, recording, transcribing
File uploadrole="button", name="Upload file"idle, uploading, processing
Message threadtestid="message-thread"empty, loading, populated
Agent responsetestid="agent-response-{id}"streaming, complete, error
Progress displayrole="status"hidden, streaming-steps, complete
Feedbackrole="status"success toast, error toast

How does the user move from pain to effortless performance?

The Screen Contract above defines WHAT /chat contains. This section defines WHERE it lives in the app, HOW users discover it, and WHY the information architecture changes.

Pain-to-Perform Journey

Every navigation decision maps to a stage in the user's journey from friction to flow. The UI team builds the navigation that carries the user through each stage — not just the destination screen.

StageUser StateNavigation Must DoCurrent StateTarget State
Pain"I navigate 4 modules to assemble one answer"Show the cost of the current path6 sidebar sections, no unifying entrySame — this is what the user sees before chat
Awareness"There's a faster way"Surface the conversational entry without disrupting existing flowNo chat entry point existsChat icon in sidebar + floating widget pulse on first login
First Value"That was faster than clicking through forms"Deliver one complete answer in <5 seconds from first messageN/ASuggested prompts on empty state: "What's my pipeline?", "Draft outreach for...", "Upload an RFP"
Habit"I go to chat first now"Make chat the natural starting pointDashboard shows 6 section cardsChat becomes first sidebar item. Dashboard shows recent conversations above section cards
Mastery"I do everything from chat"Support power-user shortcuts without removing form accessForms are the only pathChat responses include deep-links to form views. Form pages include "Ask about this" chat triggers
Effortless"The agent anticipates what I need"Proactive suggestions based on contextN/A (parked)Agent suggests next action on login. Context-aware prompts per section
StateSidebarDashboardSection PagesWidget
Flag OFFProposals, Pipeline, Insights, Plans, Agents, Settings6 section cards + 4 quick actionsNo chat triggersHidden
Flag ON (T0)Chat (first item), Proposals, Pipeline, Insights, Plans, Agents, SettingsRecent conversations card above section cards. "Ask anything" promptNo cross-links yetFloating chat button (bottom-right), collapsed by default
Flag ON (T2+)Same as T0Same as T0Each section gets "Ask about this" contextual chat triggerSame, opens with section-aware suggested prompts

Cross-Cutting Navigation

Chat is not a module — it is a layer across all modules. The navigation must reflect this.

FromToTriggerWhat Happens
Any pageChat (full)Click sidebar "Chat" or floating widgetNavigate to /chat. If widget was open with context, carry conversation forward
Any pageChat (widget)Click floating buttonWidget expands in-place. User stays on current page. Agent has page context
ChatPipelineAgent shows deal data, user clicks "View in Pipeline"Navigate to /pipeline/deals/&#123;id&#125;. Chat widget stays accessible
ChatProposalsAgent shows RFP draft, user clicks "Open in Proposals"Navigate to /proposals/ventures/&#123;id&#125;. Draft pre-populated
PipelineChat"Ask about this deal" button on deal cardWidget opens with deal context pre-loaded: "Tell me about deal.name"
ProposalsChat"Process with AI" button on RFP pageWidget opens with venture context: "I have questions about venture.name"
[Chat]          ← NEW: first item, conversation icon
─────────────── ← visual separator
Proposals ← existing
Pipeline ← existing
Insights ← existing
Plans ← existing
Agents ← existing
───────────────
Settings ← existing, bottom-anchored

Chat goes first because the PRD's own principle states: "The agent IS the product. Forms are the escape hatch." If chat is buried below existing sections, the navigation contradicts the intent.

Wiring Coordinates

ElementFileAction
Sidebar nav itemsBLOCKER: locate sidebar component in app/(app)/ layoutAdd "Chat" as first NavItem with conversation icon
Floating widgetBLOCKER: libs/ui/src/components/chat/ChatWidget.tsxNew component. Renders on all (app) pages when feature flag ON
Dashboard cardBLOCKER: app/(app)/dashboard/page.tsxAdd "Recent Conversations" card above section grid
Section chat triggersBLOCKER: per-section page filesAdd "Ask about this" button to Pipeline deal cards and Proposals venture pages
Feature flag middlewareBLOCKER: depends on org_feature_flags table (PR #407 schema)Conditionally render chat sidebar item + widget

Principles

What truths constrain the design?

The Job

ElementDetail
SituationA sales rep's day: check pipeline, respond to RFPs, research prospects, update deals, draft outreach. Each task means navigating to a different module, filling forms.
IntentionThe rep opens drmg-sales and talks to the agent. Every workflow accessible through conversation. The agent serves the user, not the database.
ObstacleNo conversational interface exists. WorkCharts accept string-only inputs. No multimodal input normalisation. No session memory. No streaming UI for WorkCharts.
Hardest ThingMaking the agent feel like a colleague who knows your pipeline, your style, your history — not a chatbot that asks you to rephrase.

The hidden objection: "I'll spend more time explaining what I want to the AI than it would take to just click through the forms." The answer: context + initiative. The agent remembers your pipeline, suggests the next action, and only asks when genuinely ambiguous.

Why Now

8 WorkCharts x 0 conversational entry = 0 agent value to users
1x1 modality coverage (text-to-text) out of 49 possible (7x7)
Vercel AI SDK (@ai-sdk/react, useChat) in stack but unused in UI
GPT-4o voice, Claude voice, Gemini Live = multimodal input feasible now
HubSpot AI, Salesforce Einstein, Clay = competitors shipping conversational CRM
Without this, drmg-sales is a form-based SaaS in an agent-first world

Design Constraints

ConstraintRationale
Conversation-first, forms-secondThe agent IS the product. Forms are the escape hatch for power users who want direct manipulation.
Text MVP, voice in T1Text is the simplest input to normalise and test. Voice adds STT complexity. Ship text, prove value, add voice.
Agent composes, human sendsNo autonomous external communication. Agent drafts outreach, user reviews and sends.
Multi-tenant isolationConversation context, session memory, and CRM queries scoped to org. Never cross boundaries.
Feature-flagged per orgIncremental rollout. Some orgs get conversation first. All orgs keep forms.

Performance

How do we know it's working?

Priority Score

PRIORITY = Pain x Demand x Edge x Trend x Conversion

DimensionScore (1-5)Evidence
Pain5Every workflow requires navigation + form-fill. 8 WorkCharts with no user-facing conversational entry.
Demand4HubSpot AI, Salesforce Einstein, Clay shipping conversational CRM. Not validated with our users.
Edge3Agency lib WorkCharts + modalities knowledge + skill routing. No proprietary data or network effect.
Trend5Omnimodal models make multimodal input table stakes. Every SaaS tool gets a conversational layer.
Conversion2AI feature pricing not validated. Bundled or premium tier? No pilot customer.
Composite6005 x 4 x 3 x 5 x 2

North Star: Conversational task completion > 50% (tasks completed via conversation / total tasks).

Quality Targets

MetricTargetNow
Conversational task rate>50%0%
Agent response time (text)<3sN/A
Modality classification accuracy>95%N/A
Session context retention10+ turnsN/A
RFP processing time (PDF drop)<5 minutesN/A

Failure Budget

Failure TypeBudgetResponse
Cross-org data leak0%Immediate halt — trust destroyed
Agent hallucination (CRM)<2%Flag, log, investigate — wrong data costs deals
Misrouted WorkChart<5%Improve classifier — user wastes time, not trust
Dropped session context<5%Fix session store — annoying but recoverable

Kill signal: Chat widget ships but <10% of tasks go through conversation after 30 days. Users bypass agent and navigate directly to forms.

Platform

What do we control?

Current State

ComponentBuiltWiredWorkingNotes
WorkChart orchestrators (8)YesYesYesNo conversational entry point
Skill routerYesYesYesText-only, no modality detection
Vercel AI SDK (useChat)YesNoNoIn package.json, unused in UI
Agent registryYesPartialPartialAgent Platform L2
Session/memory storeNoNoNoAgent Platform dependency
STT integrationNoNoNoWhisper/Deepgram not yet integrated
Multimodal input normaliserNoNoNoPDF/image/voice to text pipeline missing
Streaming UINoNoNoNo progress display for WorkChart execution
Chat UI componentNoNoNoNo conversational interface exists

Build Ratio

~40% composition (Vercel AI SDK, WorkCharts, skill router, agent registry), ~60% new code (chat UI, modality router, input normaliser, session memory, streaming progress).

Protocols

How do we coordinate?

Build Order

SprintFeaturesWhatEffortAcceptance
T0#1, #4, #11Text chat UI + session memory + feature flag5 daysUser sends text, agent responds with streaming. Context across turns.
T1#5, #6, #7Modality router + WorkChart adapter + normaliser5 days"Process this RFP" routes to sales-rfp-workflow correctly.
T2#2, #8, #10File upload + streaming progress + rich results5 daysPDF drop extracts content, streams progress, renders structured output.
T3#9, #12CRM query adapter + conversation history4 days"What's my pipeline?" returns correct deal summary.
T4#3Voice input (STT)3 daysTap mic, speak, transcription becomes message.
ParkTTS responses, proactive agent, advanced analyticsOnly after T0-T3 prove conversational value.

Total: ~22 days T0-T4. Kill date: 2026-05-01.

Commissioning

#Feature groupInstallTestOperationalOptimize
1, 4Text chat + session memory
5-7Modality router + normaliser
2, 8File upload + streaming progress
9-10CRM query + rich results
3Voice input
11-12Feature flag + conversation history

Agent-Facing Spec

Commands:

pnpm nx serve stackmates        # Dev server
pnpm nx test stackmates # Unit tests
pnpm nx e2e stackmates-e2e # E2E tests

Boundaries: Always: UI layout, streaming strategy, model selection. Ask first: schema changes, new entity types, external API keys. Never: delete user data, send external communications, modify billing.

Players

Who creates harmony?

Demand-Side Jobs

Job 1: Get Answers Without Navigation

Situation: Sales rep needs pipeline status before a meeting in 10 minutes. Currently: open app, navigate to pipeline page, scan kanban, filter by date, mentally summarise.

ElementDetail
Struggling momentNavigating through 4 modules to assemble context that should be one question away
Current workaroundOpen 3-4 tabs, manually cross-reference deals with contacts with ventures
What progress looks likeAsk "What needs attention today?" and get a prioritised action list in 3 seconds
Hidden objection"I'll spend more time explaining to the AI than clicking through forms"
Switch triggerMissed a follow-up because the deal was buried in a pipeline view nobody checks daily

Features that serve this job: #1, #4, #9, #10

Job 2: Process Documents Without Copy-Paste

Situation: RFP PDF arrives. Currently: download, open in viewer, copy-paste questions into textarea one by one, wait for AI to generate answers, manually review.

ElementDetail
Struggling momentCopy-pasting from PDF to web form, losing formatting, missing questions buried in appendices
Current workaroundManual extraction into spreadsheet, then paste into venture Q&A form
What progress looks likeDrop PDF in chat, agent extracts all questions, streams draft answers, asks for review
Hidden objection"AI will miss context in the document and generate wrong answers I'll have to fix anyway"
Switch triggerLost a bid because a question in a 60-page appendix was missed during manual extraction

Features that serve this job: #2, #5, #6, #7, #8

Job 3: Draft Outreach Without Context-Switching

Situation: Three prospects need personalised outreach. Currently: open Sales Dev, navigate to prospect, review profile, click generate, wait, review, repeat x3.

ElementDetail
Struggling momentContext-switching between prospect research and draft generation across modules
Current workaroundWrite outreach in email client, occasionally reference CRM data in another tab
What progress looks like"Draft outreach for these 3 prospects" generates personalised drafts in one conversation
Hidden objection"AI outreach is generic. I'll have to rewrite it anyway."
Switch triggerCompetitor's AI assistant drafts outreach that references specific company news automatically

Features that serve this job: #1, #4, #6, #9

ICP: SMB Sales Team Without IT

AttributeSpecification
RoleSales rep, BD manager, or sales director at SMB without dedicated IT resource
Context5-50 employees, using drmg-sales for CRM + RFP, comfortable with chat interfaces
GeographyNew Zealand initially, English-speaking markets
BudgetAlready paying for drmg-sales. AI features as bundled or $10-20/seat premium tier.

Psycho-logic: "We don't need AI" means "we tried a chatbot that couldn't find our data." The stated objection is AI skepticism. The real objection is broken promises from previous AI tools. The unlock: the agent knows your pipeline because it shares the database. First interaction must demonstrate real CRM awareness, not generic responses.

Role Definitions

RoleAccessPermissions
AdminChat + all CRM data, org-wideConfigure feature flag, view conversation audit, manage agent
Sales RepChat + own org CRM dataConverse, upload files, query CRM, trigger WorkCharts
ViewerChat + read-only CRM dataAsk questions, view results. No mutations via conversation.

Relationship to Other PRDs

PRDRelationshipData Flow
Agent PlatformPlatform (depends)Identity, memory, comms, dispatch — infrastructure
Identity & AccessPlatform (depends)Auth, roles, permissions — required for any UI
Sales CRM & RFPPeer (layer above)CRM data model + form features. This = access method.
Sales Dev AgentPeer (consumer)SDR agent logic would use this conversation surface

Context

Questions

What happens when the agent knows your pipeline better than you do — and acts on it?

  • If S1 FORBIDDEN fires (hallucinated RFP questions), does the user lose trust in the agent permanently or just for that session?
  • Which Story Contract row will be hardest to make GREEN — and does that story block the others?
  • When the modality router misclassifies (S5), is the cost a wasted WorkChart run or a wrong proposal sent to a client?