Practical evaluation tools

Agent-first tool checklists

Use these checklists when choosing, building, or reviewing tools that AI agents will operate. They focus on the interface details that decide whether an agent can act safely: schemas, scopes, dry runs, receipts, status checks, logs, and recovery paths.

APIs

Agent-first API checklist

Evaluate REST, GraphQL, and HTTP APIs before agents depend on them. Covers auth scopes, idempotency, pagination, errors, webhooks, sandboxes, and audit trails.

Use the API checklist
CLIs

Agent-first CLI checklist

Check whether a command-line tool is safe for automated agent runs. Covers non-interactive flags, structured output, exit codes, dry runs, config, and secret handling.

Use the CLI checklist
MCP servers

Agent-first MCP server checklist

Review MCP servers as production tool surfaces, not just integration wrappers. Covers discoverable tools, narrow permissions, idempotency, receipts, verification, and recovery.

Use the MCP checklist

Which checklist should you use first?

If agents call a hosted product directly, start with the API checklist. Most agent workflows depend on predictable request and response shapes, scoped credentials, explicit errors, and durable receipts.
If agents run commands in terminals, add the CLI checklist. A good CLI needs machine-readable output, stable exit codes, non-interactive operation, bounded logs, and clear rollback or cleanup steps.
If agents connect through MCP, use the MCP checklist as the operating-surface review. MCP improves connection and discovery, but the underlying tools still need narrow authority, verifiable outcomes, and recoverable failures.
Fast rule: a tool is not agent-first because it has an API, CLI, or MCP server. It is agent-first when an autonomous system can discover what it is allowed to do, take the smallest safe action, verify the outcome, and recover when something goes wrong.