Skip to main content

CLI Improvement Spec — Plan, Comms, ETL

Audience: Engineering (stackmates).
Goal: Improve how the Plan, Comms, and Agent ETL CLIs are written so they meet the Agent CLI standard and are safe and discoverable for both human and agent operators.

Standard: Agent CLI Tools — checklist, design review template, and 10-dimension scorecard. Production bar: 16/20, with no zeros on structured I/O, input hardening, safety rails, or contract stability.

Scope

CLIEntrypointTransportPrimary job
Plannpx tsx tools/scripts/orch-meta/planning/plan-cli.tsPostgresProjects, tasks, health (doctor, active --team=X, status --plan-id=X)
Commsnpx tsx tools/scripts/comms/agent-comms.tsConvexReal-time chat (read --channel=…, post --channel=… --type=…)
ETLnpx tsx tools/scripts/etl/agent-etl-cli.tsPostgresAgent profiles, CRM seed, governance, AI capabilities (load --agent=X)

Intent

  • Who we're serving: Primary operator is often an agent (orchestrators, session runners, dream-team workflows). Human use remains supported; agent use must not depend on scraping prose or guessing structure.
  • What must always be true: Deterministic I/O contract, bounded output, validated input, safe mutation path (dry-run or explicit confirmation for writes).
  • What we're not changing: Transport (Postgres/Convex), repo layout, or feature set. We are adding and hardening contracts, introspection, safety rails, and guidance so the CLIs are agent-grade.

Acceptance criteria (stories)

Stories are testable. Engineering may implement in any order that respects dependencies; completion is judged per CLI via the scorecard in the standard.

Contract and structured I/O

IDCLIIntentionTriggerObservable successFailure =
C1AllMachine-readable output for every read/list commandInvoke with --json (or agreed flag)stdout is valid JSON or NDJSON; exit 0No --json or output is prose-only
C2AllStable, documented exit codesRun success and failure pathsExit codes documented in --help or schema; same code for same outcomeUndocumented or inconsistent codes
C3AllHuman vs machine output isolatedRun with and without --jsonWithout --json: human-friendly; with --json: only machine payloadMixed prose and JSON in same stream
C4PlanStructured input for create/mutatecreate / mutate with --params or stdin JSONCLI accepts JSON params or stdin; rejects invalid shape with clear errorOnly positional/loose flags
C5CommsStructured read outputread --channel=… --jsonArray of message objects with stable fields (e.g. id, channel, type, body, ts)Unstable or undocumented shape
C6ETLStructured load/outputload --agent=X --jsonOutput is JSON; list/describe commands support --jsonNo machine output option

Runtime introspection

IDCLIIntentionTriggerObservable successFailure =
I1AllAgent can discover commands and params at runtimeRun help --json or schema or describeJSON listing commands, args, and (where applicable) request/response shapeNo machine-readable help/schema
I2AllRequired and optional fields discoverableIntrospection outputRequired vs optional and types are explicitAgent must read external docs to know shape

Context discipline

IDCLIIntentionTriggerObservable successFailure =
X1PlanLarge lists paginated or limitedlist-plans / status with many rowsPagination or --limit; default page size bounded (e.g. 20)Unbounded dump
X2CommsRead supports time window and limitread --since=24h and limit--since and --limit (or equivalent) reduce payloadAlways returns full history
X3ETLList/load support field selection or minimal defaultList agents or load one--fields or minimal default object so agent can request only needed fieldsAlways full object graph

Input hardening

IDCLIIntentionTriggerObservable successFailure =
H1AllUntrusted input validatedPass malformed IDs, path traversal, control charsRejected with explicit error; no crash or unsafe side effectSilent accept or crash
H2AllChannel/plan-id/agent-id validatedInvalid or injective stringsClear validation error; no SQL/NoSQL or command injectionInjection possible
H3PlanPhase/task IDs and payloads validatedInvalid or missing required fields (e.g. phaseSlug)Zod or equivalent; 4xx-style error messageNOT NULL or type errors only at DB

Safety rails

IDCLIIntentionTriggerObservable successFailure =
S1PlanEvery mutating command supports dry-runcreate, update, or other write + --dry-runNo DB/Convex write; output describes what would be doneNo dry-run option
S2CommsPost is explicit; no accidental broadcastpost without required channel/typeRejected or promptedPost to wrong channel with default
S3ETLLoad/write operations support dry-run or safe previewMutating load (if any) or seed--dry-run or preview output before applyDestructive without rehearsal
S4AllDestructive actions require explicit override or confirmationAny delete or overwriteRequires flag (e.g. --confirm) or interactive confirmSilent overwrite/delete

Response safety

IDCLIIntentionTriggerObservable successFailure =
R1CommsReturned message bodies not assumed safe for injectionRead messages that may contain user/agent contentDesign doc or code: treat as untrusted; no raw paste into prompts without sanitization noteAssumed safe
R2ETLLoaded profile/capability content treated as untrustedLoad agent with rich text or external refsDocument or filter: prompt-injection risk from returned data consideredNo guidance

Guidance (packaged for agents)

IDCLIIntentionTriggerObservable successFailure =
G1AllAgent-facing invariants next to the toolLook in CLI dir or tools/scriptsCONTEXT.md, AGENTS.md, or SKILL.md (or stackmates equivalent) present with: primary operator, trust boundary, required workflow rules (e.g. confirm-before-delete, use --json for automation)No packaged guidance
G2AllExample calls for high-value workflowsRead guidance docAt least: one read path, one write path (if any), and any gotcha (e.g. always --fields for list)No examples or gotchas

Auth and headless operation

IDCLIIntentionTriggerObservable successFailure =
A1CommsConvex auth works without browserRun read/post in CI or headlessEnv or file-based auth; no interactive browserBrowser-only flow
A2ETLPostgres/connection works headlessRun load from automation.env.prod or injected credentials; no interactive promptRequires interactive login
A3PlanDB connection headlessRun plan-cli in automationSame as ETL; credentials from env or configInteractive only

Failure design and observability

IDCLIIntentionTriggerObservable successFailure =
F1AllCommon bad inputs covered by testsTest suiteTests for: malformed IDs, missing required flags, invalid JSONNo negative-path tests
F2AllDestructive path and dry-run testedTest suiteTest that dry-run does not write; test that real write is gatedNo safety-path tests
F3AllOutput contract regression testedTest suiteSchema or snapshot tests for JSON output so changes are detectedNo contract tests
F4AllObservability for unattended runsRun failing commandstderr or logs include request ID or enough context to debugSilent failure or opaque errors

Non-negotiables before production

Each CLI must reach at least 16/20 on the scorecard with no zero on:

  • Structured I/O
  • Input hardening
  • Safety rails
  • Contract stability (documented, stable output shape)

So: structured I/O, input validation, dry-run for writes (where applicable), and stable machine contract are required before marking a CLI production-ready for agents.

Implementation notes

  • Shared patterns: Consider a small shared layer (e.g. --json, exit code constants, help --json) so all three CLIs behave consistently.
  • Backward compatibility: Default behavior (no --json) should remain human-friendly; additive flags and new output modes are preferred over breaking changes.
  • Doc location: Guidance can live in stackmates (e.g. tools/scripts/orch-meta/planning/README.md, tools/scripts/comms/README.md, tools/scripts/etl/CLAUDE.md) and be linked from the dream repo where relevant.
  • MCP/surfaces: If a CLI is later exposed as an MCP tool or another surface, derive that surface from the same capability model (one source of truth).

Handoff

  • Dream team: Owns this spec and acceptance criteria; updates when the standard or priorities change.
  • Engineering: Owns implementation, task breakdown, and test implementation; may propose spec changes via PR or comms.
  • Done when: Each of Plan, Comms, ETL passes the design review with a recorded score and no zeros on the four non-negotiables.

Questions

Which of the three CLIs would compound value fastest if improved first — Plan (workflow), Comms (signalling), or ETL (agent data)?

  • What shared npm package or CLI harness could enforce --json, exit codes, and help schema across all three without rewriting each from scratch?
  • If an agent is the primary operator of plan-cli, what one invariant must it never violate that you would put in AGENTS.md?
  • Where does Convex auth for Comms already support headless use, and where does it still assume a browser?