Official docs search API benchmark, May 2026

Short verdict: SerpAPI had the strongest relevance metrics in this 30-task official-docs cohort, with 100% Success@3 and 0.933 MRR. Brave was close on relevance, faster, and also reached every expected source within the top 10. Tavily returned useful results but more often ranked third-party pages above official docs for this specific task family.

Results summary

Serpapi

100% Success@3 across 30 official-docs tasks. Success@1: 90%. Median latency: 2016 ms.

Brave

97% Success@3 across 30 official-docs tasks. Success@1: 83%. Median latency: 1074 ms.

Tavily

80% Success@3 across 30 official-docs tasks. Success@1: 47%. Median latency: 1623 ms.

Provider	Success@1	Success@3	Success@10	MRR	Median latency
Serpapi	90%	100%	100%	0.933	2016 ms
Brave	83%	97%	100%	0.903	1074 ms
Tavily	47%	80%	100%	0.635	1623 ms

Success@k means at least one expected official source appeared in the top k results. MRR is mean reciprocal rank for the first expected source. Relevance labels were URL-pattern and official-domain based, then reviewed for moved official docs domains before publishing.

What this means for agent workflows

If the agent needs the official source in the first few results, SerpAPI performed best in this run. It found an expected official source in the top 3 for all 30 tasks.
If latency matters, Brave was materially faster in this sample. Its median response was about 1.1s, compared with about 2.0s for SerpAPI.
Tavily may still fit answer-style research workflows, but this test was narrower. We measured retrieval of official documentation URLs, not generated answer quality.
Do not generalize this to all search tasks. Current facts, exact error lookup, legal/compliance research, and source-diversity tasks need separate cohorts.

Method in brief

The task set contains 30 official-documentation queries across AI APIs, browser automation, infrastructure, data stores, and workflow tools. Each provider was called once per task on 12 May 2026. We saved response status, latency, result count, top result URLs, and rank observations.

The primary relevance signal was whether an expected official URL pattern or accepted official domain appeared in the top 10. This is intentionally simple and auditable. It does not judge snippet quality, generated answer quality, pricing, rate limits, or long-term stability.

Download the evidence tables

Summary CSV Task results CSV Task set and qrels JSON Metrics JSON

Limits and next steps

This is a dated May 2026 cohort, not a timeless ranking.
The benchmark uses one run per provider; repeated runs would be needed for stability claims.
Provider categories differ: SerpAPI wraps Google results, Brave exposes Brave Search, and Tavily is more answer/research oriented.
The next high-value cohort should test exact technical-error lookup or current pricing/version discovery, where agents often fail expensively.

Need this for your stack?

AgentFirstTools can inspect a tool shortlist or agent workflow and produce a narrow evidence-backed audit before you depend on it in production.

Ask about an agent-first audit

Which search API finds official docs best for AI agents?

Results summary

Serpapi

Brave

Tavily

What this means for agent workflows

Method in brief

Download the evidence tables

Limits and next steps

Need this for your stack?