CLI Improvement Spec — Plan, Comms, ETL
Audience: Engineering (stackmates).
Goal: Improve how the Plan, Comms, and Agent ETL CLIs are written so they meet the Agent CLI standard and are safe and discoverable for both human and agent operators.
Standard: Agent CLI Tools — checklist, design review template, and 10-dimension scorecard. Production bar: 16/20, with no zeros on structured I/O, input hardening, safety rails, or contract stability.
Scope
| CLI | Entrypoint | Transport | Primary job |
|---|
| Plan | npx tsx tools/scripts/orch-meta/planning/plan-cli.ts | Postgres | Projects, tasks, health (doctor, active --team=X, status --plan-id=X) |
| Comms | npx tsx tools/scripts/comms/agent-comms.ts | Convex | Real-time chat (read --channel=…, post --channel=… --type=…) |
| ETL | npx tsx tools/scripts/etl/agent-etl-cli.ts | Postgres | Agent profiles, CRM seed, governance, AI capabilities (load --agent=X) |
Intent
- Who we're serving: Primary operator is often an agent (orchestrators, session runners, dream-team workflows). Human use remains supported; agent use must not depend on scraping prose or guessing structure.
- What must always be true: Deterministic I/O contract, bounded output, validated input, safe mutation path (dry-run or explicit confirmation for writes).
- What we're not changing: Transport (Postgres/Convex), repo layout, or feature set. We are adding and hardening contracts, introspection, safety rails, and guidance so the CLIs are agent-grade.
Acceptance criteria (stories)
Stories are testable. Engineering may implement in any order that respects dependencies; completion is judged per CLI via the scorecard in the standard.
Contract and structured I/O
| ID | CLI | Intention | Trigger | Observable success | Failure = |
|---|
| C1 | All | Machine-readable output for every read/list command | Invoke with --json (or agreed flag) | stdout is valid JSON or NDJSON; exit 0 | No --json or output is prose-only |
| C2 | All | Stable, documented exit codes | Run success and failure paths | Exit codes documented in --help or schema; same code for same outcome | Undocumented or inconsistent codes |
| C3 | All | Human vs machine output isolated | Run with and without --json | Without --json: human-friendly; with --json: only machine payload | Mixed prose and JSON in same stream |
| C4 | Plan | Structured input for create/mutate | create / mutate with --params or stdin JSON | CLI accepts JSON params or stdin; rejects invalid shape with clear error | Only positional/loose flags |
| C5 | Comms | Structured read output | read --channel=… --json | Array of message objects with stable fields (e.g. id, channel, type, body, ts) | Unstable or undocumented shape |
| C6 | ETL | Structured load/output | load --agent=X --json | Output is JSON; list/describe commands support --json | No machine output option |
Runtime introspection
| ID | CLI | Intention | Trigger | Observable success | Failure = |
|---|
| I1 | All | Agent can discover commands and params at runtime | Run help --json or schema or describe | JSON listing commands, args, and (where applicable) request/response shape | No machine-readable help/schema |
| I2 | All | Required and optional fields discoverable | Introspection output | Required vs optional and types are explicit | Agent must read external docs to know shape |
Context discipline
| ID | CLI | Intention | Trigger | Observable success | Failure = |
|---|
| X1 | Plan | Large lists paginated or limited | list-plans / status with many rows | Pagination or --limit; default page size bounded (e.g. 20) | Unbounded dump |
| X2 | Comms | Read supports time window and limit | read --since=24h and limit | --since and --limit (or equivalent) reduce payload | Always returns full history |
| X3 | ETL | List/load support field selection or minimal default | List agents or load one | --fields or minimal default object so agent can request only needed fields | Always full object graph |
| ID | CLI | Intention | Trigger | Observable success | Failure = |
|---|
| H1 | All | Untrusted input validated | Pass malformed IDs, path traversal, control chars | Rejected with explicit error; no crash or unsafe side effect | Silent accept or crash |
| H2 | All | Channel/plan-id/agent-id validated | Invalid or injective strings | Clear validation error; no SQL/NoSQL or command injection | Injection possible |
| H3 | Plan | Phase/task IDs and payloads validated | Invalid or missing required fields (e.g. phaseSlug) | Zod or equivalent; 4xx-style error message | NOT NULL or type errors only at DB |
Safety rails
| ID | CLI | Intention | Trigger | Observable success | Failure = |
|---|
| S1 | Plan | Every mutating command supports dry-run | create, update, or other write + --dry-run | No DB/Convex write; output describes what would be done | No dry-run option |
| S2 | Comms | Post is explicit; no accidental broadcast | post without required channel/type | Rejected or prompted | Post to wrong channel with default |
| S3 | ETL | Load/write operations support dry-run or safe preview | Mutating load (if any) or seed | --dry-run or preview output before apply | Destructive without rehearsal |
| S4 | All | Destructive actions require explicit override or confirmation | Any delete or overwrite | Requires flag (e.g. --confirm) or interactive confirm | Silent overwrite/delete |
Response safety
| ID | CLI | Intention | Trigger | Observable success | Failure = |
|---|
| R1 | Comms | Returned message bodies not assumed safe for injection | Read messages that may contain user/agent content | Design doc or code: treat as untrusted; no raw paste into prompts without sanitization note | Assumed safe |
| R2 | ETL | Loaded profile/capability content treated as untrusted | Load agent with rich text or external refs | Document or filter: prompt-injection risk from returned data considered | No guidance |
Guidance (packaged for agents)
| ID | CLI | Intention | Trigger | Observable success | Failure = |
|---|
| G1 | All | Agent-facing invariants next to the tool | Look in CLI dir or tools/scripts | CONTEXT.md, AGENTS.md, or SKILL.md (or stackmates equivalent) present with: primary operator, trust boundary, required workflow rules (e.g. confirm-before-delete, use --json for automation) | No packaged guidance |
| G2 | All | Example calls for high-value workflows | Read guidance doc | At least: one read path, one write path (if any), and any gotcha (e.g. always --fields for list) | No examples or gotchas |
Auth and headless operation
| ID | CLI | Intention | Trigger | Observable success | Failure = |
|---|
| A1 | Comms | Convex auth works without browser | Run read/post in CI or headless | Env or file-based auth; no interactive browser | Browser-only flow |
| A2 | ETL | Postgres/connection works headless | Run load from automation | .env.prod or injected credentials; no interactive prompt | Requires interactive login |
| A3 | Plan | DB connection headless | Run plan-cli in automation | Same as ETL; credentials from env or config | Interactive only |
Failure design and observability
| ID | CLI | Intention | Trigger | Observable success | Failure = |
|---|
| F1 | All | Common bad inputs covered by tests | Test suite | Tests for: malformed IDs, missing required flags, invalid JSON | No negative-path tests |
| F2 | All | Destructive path and dry-run tested | Test suite | Test that dry-run does not write; test that real write is gated | No safety-path tests |
| F3 | All | Output contract regression tested | Test suite | Schema or snapshot tests for JSON output so changes are detected | No contract tests |
| F4 | All | Observability for unattended runs | Run failing command | stderr or logs include request ID or enough context to debug | Silent failure or opaque errors |
Non-negotiables before production
Each CLI must reach at least 16/20 on the scorecard with no zero on:
- Structured I/O
- Input hardening
- Safety rails
- Contract stability (documented, stable output shape)
So: structured I/O, input validation, dry-run for writes (where applicable), and stable machine contract are required before marking a CLI production-ready for agents.
Implementation notes
- Shared patterns: Consider a small shared layer (e.g.
--json, exit code constants, help --json) so all three CLIs behave consistently.
- Backward compatibility: Default behavior (no
--json) should remain human-friendly; additive flags and new output modes are preferred over breaking changes.
- Doc location: Guidance can live in stackmates (e.g.
tools/scripts/orch-meta/planning/README.md, tools/scripts/comms/README.md, tools/scripts/etl/CLAUDE.md) and be linked from the dream repo where relevant.
- MCP/surfaces: If a CLI is later exposed as an MCP tool or another surface, derive that surface from the same capability model (one source of truth).
Handoff
- Dream team: Owns this spec and acceptance criteria; updates when the standard or priorities change.
- Engineering: Owns implementation, task breakdown, and test implementation; may propose spec changes via PR or comms.
- Done when: Each of Plan, Comms, ETL passes the design review with a recorded score and no zeros on the four non-negotiables.
Questions
Which of the three CLIs would compound value fastest if improved first — Plan (workflow), Comms (signalling), or ETL (agent data)?
- What shared npm package or CLI harness could enforce
--json, exit codes, and help schema across all three without rewriting each from scratch?
- If an agent is the primary operator of
plan-cli, what one invariant must it never violate that you would put in AGENTS.md?
- Where does Convex auth for Comms already support headless use, and where does it still assume a browser?