Skip to main content

Claude Code Tools

How does Claude Code connect judgment to computation?


Tool Use

Claude Code's power comes from tool calling — the agent reasons about what to do, then calls tools to execute.

FactDetail
WhatClaude selects and calls tools (Read, Write, Edit, Bash, Grep, Glob, Task, WebFetch) to accomplish tasks.
PatternReasoning → tool selection → execution → result → next reasoning step.
DocsTool use

input_examples

Concrete examples that anchor tool calling consistency. Instead of describing what a tool does abstractly, you show it real input/output pairs.

FactDetail
WhatConcrete input/output examples attached to tool definitions. Make LLM tool calling consistent across sessions.
Pattern"Is this more like Pain=5 ($12K/month lost) or Pain=1 (calendar-could-be-better)?"
Docsinput_examples

Our use: The score-prds skill uses calibration examples as input_examples — concrete PRD anchors at each score level so different agents produce convergent scores.


Programmatic Tool Calling

FactDetail
WhatClaude writes code that calls tools in a sandboxed container. Reduces round-trips between reasoning and execution.
PatternInstead of Claude → tool → result → Claude → tool → result, it becomes Claude → writes code that calls N tools → single result.
Sandboxcode_execution_20260120 tool runs Python in an isolated container with $TOOL_CALL function.
DocsProgrammatic tool calling

How It Maps

LayerManual (our system)Programmatic
Who writes codeHuman wrote prioritise-prds.mjsClaude writes Python at runtime
Who calls itSkill tells Claude to run node scripts/...Claude's code calls tools via $TOOL_CALL
ExecutionLocal Node processSandboxed container
DeterminismScript does pure math, zero judgmentCode does pure computation, zero judgment
The splitLLM scores → script ranksLLM reasons → code executes

The core principle is identical: separate judgment from computation.

Where Each Wins

Local scripts (prioritise-prds.mjs): Version-controlled, testable, reviewable by humans, runs without API calls. Better when the computation is stable and reusable.

Programmatic tool calling: Better when computation needs to happen inside a Claude API session — agent platforms where orchestrators compose tool calls dynamically. Relevant for the Agent Platform PRD and BOaaS dispatch.


Scoring Pipeline

The scoring pipeline is the proof-of-concept for both patterns:

JUDGMENT (skill)                  MATH (script)
───────────────── ─────────────────
1. Skill loads calibration 4. Script reads frontmatter
examples (Pain=5 → $12K/mo) 5. Geometric mean, weighted sum
2. Agent reads PRD, scores 1-5 6. Gate checks (BLOCKED, NO PLAYERS)
3. Writes to frontmatter 7. Sort by final score + diff
LayerWhatDeterministic?
Calibrationinput_examples pattern anchors scoringNo — but constrained by concrete examples
Scores10 dimensions per PRD in frontmatterNo — human/LLM judgment
Algorithmnode scripts/prioritise-prds.mjsYes — pure math
EnforcementGate checks, dependency bonusesYes — binary pass/fail

The script is the commissioning principle applied to code: the team that scores (LLM) is never the team that computes the ranking (script).


Extended Thinking

FactDetail
WhatClaude uses additional "thinking tokens" before responding — visible reasoning that improves complex tasks.
WhenArchitecture decisions, multi-file refactors, ambiguous requirements.
ConfigalwaysThinkingEnabled: true in settings.json
DocsExtended thinking

Our config: Always on. Cost justified by fewer rework cycles on complex tasks.


Token-Efficient Tools

FactDetail
WhatOptimized tool result formats that reduce token consumption without losing information.
PatternTools return structured, compact results instead of verbose output.
DocsToken-efficient tools

Relevant when building API-level agent orchestration — every token saved compounds across tool calls in long-running agent sessions.

Context