Evidence-backed tool audit

Audit a tool before your AI agents depend on it.

Get a clear, evidence-backed view of whether a tool is suitable for the AI-agent workflow you have in mind.

The audit reviews available evidence such as public docs, shared materials, sandbox access, logs, traces, schemas, repo snippets, and the workflow you want to support. Sensitive actions, production access, and customer data stay out of scope unless explicitly approved.

Outcome 01

A readiness score

The audit scores the parts that matter for agent use: interface, permissions, action loops, observability, recovery, and handoff to a human.

Outcome 02

Evidence, not opinion

The report cites what was reviewed: docs, API responses, CLI behavior, auth flows, receipts, retries, failure cases, and any workflow traces available.

Outcome 03

A decision note

You get a short recommendation: use it, use it with constraints, wait for fixes, or choose a different tool. If the evidence is weak, the report says that plainly.

What the audit delivers

The audit reviews available evidence against a defined AI-agent workflow. It gives you a practical recommendation, not a security, legal, or compliance certification.

Delivery boundaries
  • Good fit: tool selection, provider comparison, API/CLI behavior, sample workflows, sandbox systems, public docs, private docs you are allowed to share, logs, schemas, and small repo excerpts.
  • Needs human approval: production access, customer data, billing changes, new credentials, destructive tests, legal/security claims, or direct changes in a client system.
  • Not offered: compliance certification, penetration testing, guaranteed business results, or unsupervised changes to sensitive infrastructure.
  • Deliverable: a written report with evidence, gaps, constraints, and a recommendation for whether and how agents should use the tool. See the sample report. Use the intake worksheet to define the workflow and safe evidence before submitting an inquiry. Implementation help is separate and requires additional scoping.

What gets reviewed

The audit is designed for teams choosing developer tools, SaaS workflows, internal platforms, APIs, CLIs, MCP servers, automation surfaces, and operations-heavy products for agent workflows.

Interface

Can an agent discover capabilities, call them without browser fragility, understand errors, and verify outcomes?

Control

Are permissions, approvals, credentials, rate limits, audit logs, rollback, and irreversible actions designed for delegation?

Reliability

Do actions expose idempotency, status checks, receipts, retries, partial failure handling, and recovery paths?

Workflow fit

Does the tool fit the task your agent should run, and are the constraints clear enough to operate safely?

Example findings

These are the kinds of issues to catch before an agent depends on a tool. People often work around them; agents usually fail or take unsafe shortcuts.

No action receipt

An API returns {"ok":true} after starting a long-running job, but exposes no job ID, status endpoint, final artifact URL, or cancellation path.

Credential scope too broad

The only practical token can read and mutate an entire workspace, so teams cannot delegate narrow work without accepting unnecessary blast radius.

Ambiguous recovery

Retries can duplicate side effects because the API lacks idempotency keys, duplicate detection, or a safe way to reconcile partial completion.

What happens after you ask

The first step is a fit check, not a commitment. The aim is to decide whether there is enough safe evidence to produce a useful audit before money changes hands.

Audit path
  1. Inquiry review: we review the tool, agent workflow, decision deadline, and any safe evidence you can share.
  2. Scope note: if the audit is a fit, you get a short scope with the evidence to review, explicit exclusions, timeline, and fixed price from £2,500.
  3. Evidence review: the audit uses public docs, sandbox access, traces, schemas, small repo excerpts, or private docs you are allowed to share.
  4. Decision report: you receive a written recommendation, scorecard, constraints, and practical next steps for agent use.

No production credentials, customer data, destructive tests, or billing changes are needed for the initial fit check.

Common questions

Short answers for teams deciding whether to ask about an audit.

Can this compare several tools?

Yes. The cleanest comparison is one defined agent workflow across two or three candidate tools, with the same scoring criteria and evidence limits.

Is this a security review?

No. The audit can flag delegation risks such as broad tokens, missing approvals, or weak logs, but it is not penetration testing, legal advice, or compliance certification.

What if the tool is not audit-ready?

The recommendation may be “wait for fixes” or “use only with constraints”. If there is not enough evidence for a useful review, the scope is narrowed or declined before it becomes paid work.

Ask about a tool audit

Tell us which tool or workflow you are considering, what your AI agents need to do, and what decision the audit needs to support. If the work is a poor fit for this kind of evidence review, we will say so before it goes further. If useful, start with the intake worksheet so the scope is concrete and no sensitive data is shared.