Practical asset

Agent-first tool scorecard

Use this scorecard to evaluate whether a tool is genuinely usable by autonomous AI agents, not just technically automatable. It turns the six AgentFirstTools criteria into a simple audit you can run against APIs, CLIs, SaaS products, internal platforms, and operational workflows.

10–20 minute auditSix criteriaDesigned for teams
How to score: give each criterion 0, 1, or 2 points. A tool scoring 9–12 is agent-ready for bounded workflows. 5–8 means useful but fragile. 0–4 means agents will need heavy human supervision or brittle workarounds.

The scoring rubric

1. Inspectable

Can an agent discover capabilities, required inputs, current state, permissions, examples, and failure modes before acting?

0 · HiddenDocs, state, and permissions are scattered or human-only.
1 · PartialSome docs or endpoints exist, but the agent must infer important details.
2 · ClearMachine-readable schemas, examples, state, and limits are easy to query.

2. Scriptable

Can every important workflow be called repeatably through a stable API, CLI, MCP server, webhook, file interface, or other non-brittle control surface?

0 · UI-onlyRequires clicking a web app for core work.
1 · MixedBasic operations are callable, but edge cases still require the UI.
2 · CompleteCore workflows are callable, documented, versioned, and automatable.

3. Bounded

Can actions be scoped by workspace, role, resource, budget, time, and approval level?

0 · All or nothingCredentials grant broad power.
1 · CoarseSome roles or limits exist, but not enough for safe delegation.
2 · Least privilegeAgents can receive narrow, revocable authority for the task.

4. Verifiable

Does every meaningful action return durable evidence: IDs, URLs, status endpoints, logs, diffs, previews, audit events, or structured success and failure signals?

0 · Trust meOnly a spinner, toast, or vague success message.
1 · Some receiptsReceipts exist but are inconsistent or hard to query later.
2 · Receipts by defaultAgents can cite and re-check proof of what happened.

5. Recoverable

Are failures explicit, retries safe, partial progress visible, and destructive operations reversible or at least clearly marked as irreversible?

0 · FragileTimeouts and partial failures leave unknown state.
1 · Manual recoveryHumans can recover, but agents lack safe retry/rollback paths.
2 · Designed recoveryIdempotency, status checks, rollbacks, and takeover points exist.

6. Composable

Can the tool participate in larger agent workflows across repos, terminals, browsers, docs, inboxes, schedulers, CI, deployments, and human handoff?

0 · SiloNo useful integration points.
1 · IntegratesSome integrations exist, but workflow state is hard to pass around.
2 · Workflow-nativeInputs, outputs, permissions, and status fit broader automation loops.

Worksheet

Copy this into an issue, doc, or audit note for each tool you evaluate.

Criterion
Score
Evidence / next fix
Inspectable
0 / 1 / 2
What can the agent query before acting?
Scriptable
0 / 1 / 2
Which core workflows are stable and callable?
Bounded
0 / 1 / 2
How narrowly can credentials and approvals be scoped?
Verifiable
0 / 1 / 2
What receipt proves the outcome?
Recoverable
0 / 1 / 2
What happens after timeout, failure, or partial completion?
Composable
0 / 1 / 2
How does this plug into wider agent workflows?

What to do with the score

Next step: if this scorecard reveals repeated gaps, those gaps are the roadmap. The highest-value agent-first work is usually boring infrastructure: status endpoints, scoped tokens, dry-runs, receipts, and rollback paths.