CLI Improvement Spec — Plan, Comms, ETL

Audience: Engineering (stackmates).
Goal: Improve how the Plan, Comms, and Agent ETL CLIs are written so they meet the Agent CLI standard and are safe and discoverable for both human and agent operators.

Standard: Agent CLI Tools — checklist, design review template, and 10-dimension scorecard. Production bar: 16/20, with no zeros on structured I/O, input hardening, safety rails, or contract stability.

Scope

CLI	Entrypoint	Transport	Primary job
Plan	`npx tsx tools/scripts/orch-meta/planning/plan-cli.ts`	Postgres	Projects, tasks, health (doctor, active --team=X, status --plan-id=X)
Comms	`npx tsx tools/scripts/comms/agent-comms.ts`	Convex	Real-time chat (read --channel=…, post --channel=… --type=…)
ETL	`npx tsx tools/scripts/etl/agent-etl-cli.ts`	Postgres	Agent profiles, CRM seed, governance, AI capabilities (load --agent=X)

Intent

Who we're serving: Primary operator is often an agent (orchestrators, session runners, dream-team workflows). Human use remains supported; agent use must not depend on scraping prose or guessing structure.
What must always be true: Deterministic I/O contract, bounded output, validated input, safe mutation path (dry-run or explicit confirmation for writes).
What we're not changing: Transport (Postgres/Convex), repo layout, or feature set. We are adding and hardening contracts, introspection, safety rails, and guidance so the CLIs are agent-grade.

Acceptance criteria (stories)

Stories are testable. Engineering may implement in any order that respects dependencies; completion is judged per CLI via the scorecard in the standard.

Contract and structured I/O

ID	CLI	Intention	Trigger	Observable success	Failure =
C1	All	Machine-readable output for every read/list command	Invoke with `--json` (or agreed flag)	stdout is valid JSON or NDJSON; exit 0	No `--json` or output is prose-only
C2	All	Stable, documented exit codes	Run success and failure paths	Exit codes documented in `--help` or schema; same code for same outcome	Undocumented or inconsistent codes
C3	All	Human vs machine output isolated	Run with and without `--json`	Without `--json`: human-friendly; with `--json`: only machine payload	Mixed prose and JSON in same stream
C4	Plan	Structured input for create/mutate	`create` / mutate with `--params` or stdin JSON	CLI accepts JSON params or stdin; rejects invalid shape with clear error	Only positional/loose flags
C5	Comms	Structured read output	`read --channel=… --json`	Array of message objects with stable fields (e.g. id, channel, type, body, ts)	Unstable or undocumented shape
C6	ETL	Structured load/output	`load --agent=X --json`	Output is JSON; list/describe commands support `--json`	No machine output option

Runtime introspection

ID	CLI	Intention	Trigger	Observable success	Failure =
I1	All	Agent can discover commands and params at runtime	Run `help --json` or `schema` or `describe`	JSON listing commands, args, and (where applicable) request/response shape	No machine-readable help/schema
I2	All	Required and optional fields discoverable	Introspection output	Required vs optional and types are explicit	Agent must read external docs to know shape

Context discipline

ID	CLI	Intention	Trigger	Observable success	Failure =
X1	Plan	Large lists paginated or limited	`list-plans` / `status` with many rows	Pagination or `--limit`; default page size bounded (e.g. 20)	Unbounded dump
X2	Comms	Read supports time window and limit	`read --since=24h` and limit	`--since` and `--limit` (or equivalent) reduce payload	Always returns full history
X3	ETL	List/load support field selection or minimal default	List agents or load one	`--fields` or minimal default object so agent can request only needed fields	Always full object graph

Input hardening

ID	CLI	Intention	Trigger	Observable success	Failure =
H1	All	Untrusted input validated	Pass malformed IDs, path traversal, control chars	Rejected with explicit error; no crash or unsafe side effect	Silent accept or crash
H2	All	Channel/plan-id/agent-id validated	Invalid or injective strings	Clear validation error; no SQL/NoSQL or command injection	Injection possible
H3	Plan	Phase/task IDs and payloads validated	Invalid or missing required fields (e.g. phaseSlug)	Zod or equivalent; 4xx-style error message	NOT NULL or type errors only at DB

Safety rails

ID	CLI	Intention	Trigger	Observable success	Failure =
S1	Plan	Every mutating command supports dry-run	`create`, `update`, or other write + `--dry-run`	No DB/Convex write; output describes what would be done	No dry-run option
S2	Comms	Post is explicit; no accidental broadcast	`post` without required channel/type	Rejected or prompted	Post to wrong channel with default
S3	ETL	Load/write operations support dry-run or safe preview	Mutating load (if any) or seed	`--dry-run` or preview output before apply	Destructive without rehearsal
S4	All	Destructive actions require explicit override or confirmation	Any delete or overwrite	Requires flag (e.g. `--confirm`) or interactive confirm	Silent overwrite/delete

Response safety

ID	CLI	Intention	Trigger	Observable success	Failure =
R1	Comms	Returned message bodies not assumed safe for injection	Read messages that may contain user/agent content	Design doc or code: treat as untrusted; no raw paste into prompts without sanitization note	Assumed safe
R2	ETL	Loaded profile/capability content treated as untrusted	Load agent with rich text or external refs	Document or filter: prompt-injection risk from returned data considered	No guidance

Guidance (packaged for agents)

ID	CLI	Intention	Trigger	Observable success	Failure =
G1	All	Agent-facing invariants next to the tool	Look in CLI dir or tools/scripts	`CONTEXT.md`, `AGENTS.md`, or `SKILL.md` (or stackmates equivalent) present with: primary operator, trust boundary, required workflow rules (e.g. confirm-before-delete, use --json for automation)	No packaged guidance
G2	All	Example calls for high-value workflows	Read guidance doc	At least: one read path, one write path (if any), and any gotcha (e.g. always --fields for list)	No examples or gotchas

Auth and headless operation

ID	CLI	Intention	Trigger	Observable success	Failure =
A1	Comms	Convex auth works without browser	Run read/post in CI or headless	Env or file-based auth; no interactive browser	Browser-only flow
A2	ETL	Postgres/connection works headless	Run load from automation	`.env.prod` or injected credentials; no interactive prompt	Requires interactive login
A3	Plan	DB connection headless	Run plan-cli in automation	Same as ETL; credentials from env or config	Interactive only

Failure design and observability

ID	CLI	Intention	Trigger	Observable success	Failure =
F1	All	Common bad inputs covered by tests	Test suite	Tests for: malformed IDs, missing required flags, invalid JSON	No negative-path tests
F2	All	Destructive path and dry-run tested	Test suite	Test that dry-run does not write; test that real write is gated	No safety-path tests
F3	All	Output contract regression tested	Test suite	Schema or snapshot tests for JSON output so changes are detected	No contract tests
F4	All	Observability for unattended runs	Run failing command	stderr or logs include request ID or enough context to debug	Silent failure or opaque errors

Non-negotiables before production

Each CLI must reach at least 16/20 on the scorecard with no zero on:

Structured I/O
Input hardening
Safety rails
Contract stability (documented, stable output shape)

So: structured I/O, input validation, dry-run for writes (where applicable), and stable machine contract are required before marking a CLI production-ready for agents.

Implementation notes

Shared patterns: Consider a small shared layer (e.g. --json, exit code constants, help --json) so all three CLIs behave consistently.
Backward compatibility: Default behavior (no --json) should remain human-friendly; additive flags and new output modes are preferred over breaking changes.
Doc location: Guidance can live in stackmates (e.g. tools/scripts/orch-meta/planning/README.md, tools/scripts/comms/README.md, tools/scripts/etl/CLAUDE.md) and be linked from the dream repo where relevant.
MCP/surfaces: If a CLI is later exposed as an MCP tool or another surface, derive that surface from the same capability model (one source of truth).

Handoff

Dream team: Owns this spec and acceptance criteria; updates when the standard or priorities change.
Engineering: Owns implementation, task breakdown, and test implementation; may propose spec changes via PR or comms.
Done when: Each of Plan, Comms, ETL passes the design review with a recorded score and no zeros on the four non-negotiables.

Questions

Which of the three CLIs would compound value fastest if improved first — Plan (workflow), Comms (signalling), or ETL (agent data)?

What shared npm package or CLI harness could enforce --json, exit codes, and help schema across all three without rewriting each from scratch?
If an agent is the primary operator of plan-cli, what one invariant must it never violate that you would put in AGENTS.md?
Where does Convex auth for Comms already support headless use, and where does it still assume a browser?

Scope​

Intent​

Acceptance criteria (stories)​

Contract and structured I/O​

Runtime introspection​

Context discipline​

Input hardening​

Safety rails​

Response safety​

Guidance (packaged for agents)​

Auth and headless operation​

Failure design and observability​

Non-negotiables before production​

Implementation notes​

Handoff​

Questions​