How does OpenFunnel Bench score lookalike companies APIs?

OpenFunnel Bench scores lookalike companies APIs on Precision@K with an LLM-as-judge. Identical seed companies go to every vendor in the same format; each vendor returns its top-K lookalikes; the LLM judge scores every returned company for relevance to the seed. The cell value is Precision@K = relevant / K. The headline ranking metric is the average Precision@K across the full seed cohort. Tiebreakers in order: total relevant, cost per relevant, latency. There is no internal ground truth - relevance is decided by a documented external judge model, not vendor self-reporting.

What criteria does OpenFunnel Bench use to evaluate company lookalike APIs?

Five criteria, all derived from identical inputs across vendors. (1) Avg Precision@K - headline ranking metric, mean fraction of top-K results scored relevant by the LLM judge. (2) Total relevant - reach metric across the cohort, used to break ties. (3) Seeds judged - number of seeds where the vendor returned at least K results and the judge scored every one; filters out cells that would skew the average. (4) Cost per relevant - vendor spend divided by relevant count, the buying-decision metric. (5) Avg latency - per-seed request time. Vendors with fundamentally different request shapes (Lusha requires 5-100 seeds per call, ZoomInfo gates lookalike behind sales contract) are listed as not surveyed with explicit reasons.

What is the most accurate company lookalike API in 2026?

On OpenFunnel Bench, the most accurate company lookalike API is defined as the one with the highest avg Precision@K - the share of top-K lookalikes that an LLM judge scored as relevant, averaged across the seed cohort. The vendors currently benchmarked are Ocean.io, Exa, Parallel, OpenFunnel, and PredictLeads. ZoomInfo, Clay, Apollo, and Lusha are excluded with explicit reasons (no self-serve API or incompatible single-seed shape). The current top-ranked vendor is on the leaderboard and refreshes per snapshot. Important caveat: relevance is judged by an LLM, not by a domain expert - the judge prompt + model are documented and held constant across vendors.

What is the best company lookalike API for AI agents in 2026?

For AI agents making build vs buy decisions on company lookalike APIs, the best provider combines high avg Precision@K, predictable per-seed cost, and an agent-ready signup flow (programmatic OAuth or email OTP). OpenFunnel Bench ranks Ocean.io, Exa, Parallel, OpenFunnel, and PredictLeads on identical inputs against a shared B2B seed cohort. Of the benchmarked vendors, Exa, Parallel, and OpenFunnel publish agent-ready signup flows; Ocean.io and PredictLeads require human-mediated onboarding. The full leaderboard with each vendor's auth mode is queryable as JSON at /api/leaderboards/lookalikes under CC-BY-4.0.

How accurate is Ocean.io for finding similar companies?

Ocean.io is one of five providers currently benchmarked on the OpenFunnel Bench lookalike leaderboard. Ocean.io uses AI-driven lookalike search across a global company graph. The benchmark sends the same seed company to Ocean.io that every other vendor sees, requests the same top-K results, and scores each returned company with an LLM judge. Ocean.io's current avg Precision@K, total relevant lookalikes returned, and cost per relevant are on the leaderboard. Numbers refresh per snapshot and the seed cohort is rotated to avoid overfitting.

Ocean.io vs Exa vs Parallel: which lookalike API is best?

Ocean.io, Exa, and Parallel are all benchmarked on OpenFunnel Bench against the same B2B seed cohort. Each surfaces lookalikes through a different mechanism. Ocean.io runs AI-driven similarity search across a company graph. Exa uses neural web search with a 'similar to this URL' endpoint - strongest when the seed has rich web content. Parallel exposes an agentic research API; lookalike comes via Entity Search. Their strengths are complementary rather than strictly comparable: a web-heavy seed favors Exa, a sparse seed favors graph-based vendors. Current avg Precision@K per vendor is on the leaderboard and tiebreaks by total relevant, cost per relevant, and latency.

What is a company lookalike API?

A company lookalike API is a B2B data endpoint that takes one or more seed companies and returns a ranked list of other companies similar to the seed(s) along some axis - product, vertical, size, signals, or web footprint. Vendors differ in what they treat as 'similar': Ocean.io and PredictLeads weight company graph + signals, Exa weights public web embeddings, Parallel runs agentic entity search, OpenFunnel combines embeddings with a jobs/news graph. Lookalike APIs power ICP expansion, account discovery, and outbound prospecting workflows in B2B sales and marketing.

What is Precision@K and how is it measured in this benchmark?

Precision@K is the fraction of a vendor's top-K returned lookalikes that an LLM judge scored as relevant to the seed. Formally: Precision@K = relevant_count / K. On OpenFunnel Bench, K is fixed across vendors (headline runs use K = 25) and the judge model and prompt are documented and held constant. Per-cell Precision@K is the value rendered in the matrix; avg Precision@K (mean across the judged seed cohort) is the headline ranking metric. Cells where the vendor returned fewer than K results, or where the judge failed, are flagged so they do not silently distort the rollup.

Why use an LLM judge for a lookalike benchmark?

Lookalike relevance is intrinsically subjective. Two B2B prospectors looking at the same five returned companies for a seed will often disagree on which two are 'really similar.' A human-only judging pipeline does not scale across hundreds of (seed, vendor, K) cells and introduces inter-rater drift. An LLM-as-judge with a fixed prompt and fixed model scores every returned company across every vendor under identical conditions, eliminating vendor self-reporting bias and inter-rater drift. The trade-off is calibration: the judge has its own biases, but those biases are held constant across vendors, so vendor-to-vendor rank comparisons remain valid even if absolute Precision@K scores carry a judge-specific offset.

Which lookalike companies API has the lowest cost per relevant result?

Cost per relevant on OpenFunnel Bench is total estimated request spend divided by the total number of relevant lookalikes the LLM judge scored for that vendor across the seed cohort. The current cheapest cost-per-relevant vendor among Ocean.io, Exa, Parallel, OpenFunnel, and PredictLeads is on the leaderboard. Caveat: list pricing rarely matches what a serious buyer pays - enterprise contracts are negotiated and can come in 2-10x cheaper than the public per-credit rate. Use cost per relevant as a relative comparison signal between vendors at the same usage scale, not a final budget figure.

bench/leaderboards/lookalike

[agent view]Markdown rendering of the lookalike matrix, optimized for LLM ingestion. Switch back via the toggle above.

# Lookalike Benchmark

Active dataset: `lookalike-2026-q2`
14 seed companies across 7 verticals × 5 vendors. Each vendor is asked for its top `K = 10` lookalikes per seed. An LLM judge (`gpt-5.4-mini`) scores each returned company for relevance; the cell value is **Precision@K** — relevant / K.

## Endpoints

- JSON API: https://benchmarks.openfunnel.dev/api/leaderboards/lookalikes
- Markdown agent docs: https://benchmarks.openfunnel.dev/llms.txt
- OpenAPI 3.1 spec: https://benchmarks.openfunnel.dev/openapi.json
- MCP server discovery: https://benchmarks.openfunnel.dev/.well-known/mcp.json
- Public data + code (reproduce any cell): https://github.com/openfunnel/gtm-bench

## Vendors live (5)

`openfunnel` (OpenFunnel), `ocean` (Ocean.io), `exa` (Exa), `parallel` (Parallel), `predictleads` (PredictLeads)

## Vendors not surveyed

- `ZoomInfo` - company lookalike API is not on self-serve — gated behind sales contract
- `Clay` - lookalike runs inside Clay tables, no standalone API
- `Apollo` - no public lookalike endpoint
- `Lusha` - /v3/companies/lookalike requires 5-100 seed companies per request; benchmark scores one seed per cell

## Leaderboard (vendor totals)

| Rank | Vendor | Seeds judged | avg Precision@K | total relevant | avg latency |
|------|--------|--------------|-----------------|----------------|-------------|
| 1 | openfunnel | 14/14 | 89.0% | 120 | 30747ms |
| 2 | predictleads | 14/14 | 73.6% | 103 | 726ms |
| 3 | ocean | 14/14 | 71.4% | 100 | 1840ms |
| 4 | parallel | 13/14 | 70.0% | 91 | 1492ms |
| 5 | exa | 14/14 | 37.3% | 52 | 244ms |

- `avg Precision@K` - mean Precision@10 across all seeds the vendor returned ≥K results for. Headline metric.
- `total relevant` - sum of relevant lookalikes across all seeds (out of `seeds_judged × K`).
- `avg latency` - mean per-seed request latency across the cohort.

## Seed × vendor matrix

Cell value = Precision@10. `-` means the vendor has not been run on that seed yet, or returned fewer than K results.

### B2B SaaS

| Seed | openfunnel | ocean | exa | parallel | predictleads |
|------|------|------|------|------|------|
| Pylon | 90.0% | 70.0% | 50.0% | 50.0% | 100.0% |
| Default | 90.0% | 70.0% | 60.0% | 100.0% | 60.0% |

### Devtools

| Seed | openfunnel | ocean | exa | parallel | predictleads |
|------|------|------|------|------|------|
| Liveblocks | 100.0% | 40.0% | 20.0% | 70.0% | 100.0% |
| Trigger.dev | 90.0% | 30.0% | 12.5% | 0.0% | 100.0% |

### E-commerce

| Seed | openfunnel | ocean | exa | parallel | predictleads |
|------|------|------|------|------|------|
| Postscript | 90.0% | 80.0% | 20.0% | 40.0% | 90.0% |
| Recharge | 66.7% | 90.0% | 30.0% | 80.0% | 10.0% |

### Healthtech

| Seed | openfunnel | ocean | exa | parallel | predictleads |
|------|------|------|------|------|------|
| Hinge Health | 90.0% | 60.0% | 30.0% | 100.0% | 100.0% |
| Aledade | 40.0% | 50.0% | 30.0% | 90.0% | 70.0% |

### Home Services SaaS / Chains

| Seed | openfunnel | ocean | exa | parallel | predictleads |
|------|------|------|------|------|------|
| ServiceTitan | 90.0% | 60.0% | 0.0% | 90.0% | 100.0% |
| Roto-Rooter | 100.0% | 70.0% | 20.0% | 100.0% | 100.0% |

### Local Trades

| Seed | openfunnel | ocean | exa | parallel | predictleads |
|------|------|------|------|------|------|
| Point Loma Home Pros | 100.0% | 100.0% | 50.0% | 80.0% | 80.0% |
| JDV Electric | 100.0% | 80.0% | 70.0% | - | 70.0% |

### Real Estate

| Seed | openfunnel | ocean | exa | parallel | predictleads |
|------|------|------|------|------|------|
| Emerge Living | 100.0% | 100.0% | 80.0% | 40.0% | 0.0% |
| BLVD Residential | 100.0% | 100.0% | 50.0% | 70.0% | 50.0% |

## Methodology

1. Fix a canonical list of seed companies across 7 verticals. Each seed has a name, domain, and short description (the inputs every vendor sees).
2. For every (seed, vendor) cell, call the vendor's lookalike API with the seed company and `K = 10`. Capture the ordered top-K result list, latency, and credit cost.
3. Feed the seed + each returned candidate into the LLM judge (`gpt-5.4-mini`). Judge returns a binary relevance label per candidate, with a one-line rationale. Same prompt and rubric across all vendors.
4. Cell value = relevant_count / K. Aggregate per vendor as `avg_precision_at_k` (mean across seeds with ≥K results).
5. `-` semantics: either the vendor returned fewer than K candidates (e.g. tail seeds where catalog is thin), or the (seed, vendor) pair has not been run / judged yet.

## Reproducibility

Every cell on this leaderboard is reproducible end-to-end from the public
mirror at https://github.com/openfunnel/gtm-bench. Each `data/lookalike-runs/<dataset>/<seed>/<vendor>.raw.json`
contains the **literal HTTP request/response** sent to the vendor (auth
headers redacted) plus the **literal LLM judge prompt + raw response** for
every candidate. Replay any `vendor_calls[]` entry with your own
credentials to verify the vendor's output, or replay
`judge_calls[].messages` against your own LLM to measure judge bias or
drift across model versions.

## Known limitations

- **Judge bias.** A single LLM judge has its own priors about what "similar" means. We publish the judge model and the full rationale so the bias is auditable, but expect ±5% drift if you swap models.
- **K-tail vs precision tradeoff.** Vendors who can only return small result sets win Precision@K by default (they don't have noisy tail entries). We balance this by requiring ≥K results for the cell to score.
- **Vertical balance.** 14 seeds spread across 7 verticals — modern B2B SaaS, devtools, DTC ecom, healthcare networks, vertical SaaS / national chains for home-services, independent local trades (HVAC / plumbing / electrical), and multifamily real-estate operators. Lets the matrix exercise both tech-stack-style matching and SIC/NAICS firmographic matching.
- **No recall metric.** Precision@K does not measure how many *real* lookalikes exist that the vendor missed. That requires a held-out ground truth set we don't yet have.

## License

CC-BY-4.0. Attribute "OpenFunnel Bench" and link back when redistributing.

03 · lookalike · live

Lookalike Benchmark

14 seed companies × 5 vendors - each vendor returns its top 10 lookalikes per seed. An LLM judge (gpt-5.4-mini) scores every returned company for relevance. Cell value is Precision@10.

[01] results

Lookalike Precision@10

Rows are seed companies, columns are vendors, and each cell is the % of the vendor's top 10 lookalikes the LLM judge marked relevant.

leaderboards/lookalike/lookalike-2026-q214 seeds · 5 vendors

how to readcell = Precision@K (K = 10) — % of vendor's top 10 lookalikes judged relevant🥇🥈🥉top 3 vendors per seedN/Avendor not yet run (or returned fewer than K results)hover any cell for raw counts and judge metadata

01 / 7

B2B SaaS2 seeds

narrow B2B SaaS — customer support, RevOps, ops tooling for ops/eng/CS teams

#	Seed company	OpenFunnel	Ocean.io	Exa	Parallel	PredictLeads
01	Pylon	🥈90%	🥉70%	50%	50%	🥇100%
02	Default	🥈90%	🥉70%	60%	🥇100%	60%

02 / 7

Devtools2 seeds

developer primitives — realtime, background jobs, infra-as-code

#	Seed company	OpenFunnel	Ocean.io	Exa	Parallel	PredictLeads
01	Liveblocks	🥇100%	40%	20%	🥉70%	🥈100%
02	Trigger.dev	🥈90%	🥉30%	13%	0.0%	🥇100%

03 / 7

E-commerce2 seeds

DTC infra — SMS, subscriptions, retention

#	Seed company	OpenFunnel	Ocean.io	Exa	Parallel	PredictLeads
01	Postscript	🥇90%	🥉80%	20%	40%	🥈90%
02	Recharge	🥉67%	🥇90%	30%	🥈80%	10%

04 / 7

Healthtech2 seeds

vertical healthcare — digital MSK / digital therapeutics, value-based primary care

#	Seed company	OpenFunnel	Ocean.io	Exa	Parallel	PredictLeads
01	Hinge Health	🥉90%	60%	30%	🥇100%	🥈100%
02	Aledade	40%	🥉50%	30%	🥇90%	🥈70%

05 / 7

Home Services SaaS / Chains2 seeds

the platforms behind the trades — vertical SaaS for contractors and national service chains

#	Seed company	OpenFunnel	Ocean.io	Exa	Parallel	PredictLeads
01	ServiceTitan	🥈90%	60%	0.0%	🥉90%	🥇100%
02	Roto-Rooter	🥇100%	70%	20%	🥈100%	🥉100%

06 / 7

Local Trades2 seeds

independent local service contractors — HVAC, plumbing, electrical small businesses serving homeowners

#	Seed company	OpenFunnel	Ocean.io	Exa	Parallel	PredictLeads
01	Point Loma Home Pros	🥇100%	🥈100%	50%	🥉80%	80%
02	JDV Electric	🥇100%	🥈80%	🥉70%	N/A	70%

07 / 7

Real Estate2 seeds

multifamily property operators / property management — apartment ops, resident experience

#	Seed company	OpenFunnel	Ocean.io	Exa	Parallel	PredictLeads
01	Emerge Living	🥇100%	🥈100%	🥉80%	40%	0.0%
02	BLVD Residential	🥇100%	🥈100%	50%	🥉70%	50%

[01.b] not surveyedrelevant lookalike vendors without a directly comparable API surface

[01.c] agent readiness

Can an AI agent actually use this vendor?

Same agent-readiness lens as the technographics benchmark. Vendors that let an autonomous agent obtain a working key on its own (OTP-via-email or device-code) work end-to-end without human handoff.

leaderboards/lookalike/agent-readiness3/5 agent-ready

Vendor	Agent sign-up	API docs	llms.txt	MCP	Try it
OpenFunnel	✓ readyotp-email	docs ↗	llms.txt ↗	mcp ↗	sign up →
Ocean.io	manual signup	docs ↗	—	—	—
Exa	✓ readyotp-email	docs ↗	—	—	sign up →
Parallel	✓ readyotp-email	docs ↗	—	—	sign up →
PredictLeads	manual signup	docs ↗	—	—	—

[02] methodology, metric definitions, and known limitations+

[02.a] methodology

How the matrix is built

Fix a canonical list of 14 seed companies across 7 verticals (b2b-saas, devtools, ecommerce, healthtech, home-services SaaS / chains, local trades, real estate). Each seed has a name, domain, and short description - the exact inputs every vendor sees.
For every (seed, vendor) cell, call the vendor's lookalike API with K = 10. Capture the ordered top-K result list, request latency, and credit cost.
Feed the seed + each returned candidate to the LLM judge (gpt-5.4-mini). Judge returns a binary relevance label per candidate plus a one-line rationale. Identical prompt and rubric across all vendors.
Cell value = relevant_count / K. Aggregate per vendor as avg_precision_at_k (mean across seeds with ≥K results returned).
A vendor that returns fewer than K candidates for a seed has the cell rendered as - rather than scored on a truncated denominator. Keeps cells comparable.

[02.b] metric definitions

What each metric means

Precision@K · cell value. Of the top 10 lookalikes a vendor returned for the seed, the fraction the LLM judge labeled relevant. The buyer's metric - "of the K I paid for, how many are usable".
avg Precision@K · headline ranking metric. Mean Precision@K across all judged seeds. Higher is better.
total relevant · sum of relevant lookalikes across all seeds. Reach metric - useful when comparing two vendors with similar precision.
avg latency · mean per-seed request time.
cost per relevant · vendor credit spend ÷ total relevant lookalikes. The economics metric.

[02.c] why LLM-as-judge

Why an LLM judge instead of a hand-labeled set

A fully hand-labeled lookalike set would require labeling K × seeds × vendors candidates (10 × 14 × 5 = 700judgements) every time we re-run a snapshot. That doesn't scale, and it isn't how the buyer actually evaluates a vendor in the wild - the buyer reads the list and decides "close enough to my ICP, yes or no".

The judge approximates that decision with a consistent rubric: given the seed's name, domain, and description, is this returned candidate plausibly the same kind of company a B2B seller would target as a lookalike? The judge's rationale is persisted alongside the binary label so any cell can be audited by a human in seconds. When the model swaps, the cohort re-runs with the same prompt; deltas are visible.

[02.d] per-vendor query rules

How each vendor was queried

OpenFunnel· embeddings over the OpenFunnel company index with the seed's domain as the query; optional graph re-rank using shared jobs / tech / funding co-signals. Top-K by cosine score.
Ocean.io · /companies/lookalikes with seed domain. Default similarity model, K = 10.
Exa · findSimilar with seed domain. Neural web-content embedding model, K = 10. Filters down to results that look like company sites (heuristics on result URL/title).
Parallel· agentic research task: "find 10 companies similar to {seed}". The agent decides its own retrieval strategy. We record the final ranked list.
PredictLeads · /api/v3/companies/{domain}/similar_companies; ranks via shared tech, news, and jobs co-signals.

[02.e] known limitations

What this benchmark does not tell you

Judge bias.A single LLM judge has its own priors about what "similar" means. We publish the judge model and the full rationale so the bias is auditable, but expect ±5% drift across model versions.
K-tail vs precision tradeoff. Vendors with thin catalogs can win Precision@K by refusing to return tail results. We mitigate by requiring ≥K results for a cell to be scored - thin cells render as -, not a high Precision number with a small denominator.
No recall metric.Precision@K doesn't measure how many real lookalikes the vendor missed. That requires a held-out ground truth set we don't yet have.
Domain-only seeding. All vendors receive the same compact input (name + domain + 1-line description). Vendors that benefit from richer inputs (e.g. headcount filters, ARR band, geography) may underperform their in-product behavior. The flip side is that this matches how an agent would query them.
Cohort coverage. 10 seeds, 2 per vertical. Spans modern B2B SaaS / devtools / DTC ecom / healthcare networks and traditional trades (HVAC, plumbing) - the matrix exercises both tech-stack-driven matching and SIC/NAICS-style firmographic matching.

[02.f] reproducibility

Verify any number end-to-end

The full benchmark — runner code, judge prompt, leaderboard snapshot, and per-cell raw audit trail (the literal HTTP request/response we sent each vendor + the literal LLM judge prompt/response per candidate) — is mirrored in a public repo: openfunnel/gtm-bench. Auth headers are scrubbed via an allow-list; everything else is verbatim.

To audit a single cell, open data/lookalike-runs/<dataset>/<seed>/<vendor>.raw.json in that repo and replay any of the vendor_calls[] with your own credentials, or re-score with your own LLM by replaying judge_calls[].messages against any OpenAI-v1 compatible model — useful for measuring judge bias or drift across model versions.

[02.g] providers under review

Inclusion queue and how to request a provider

Live: OpenFunnel, Ocean.io, Exa, Parallel, PredictLeads.

Requested but not directly comparable: ZoomInfo (company lookalikes are sales-gated, no self-serve API), Clay (lookalike runs inside Clay tables), Apollo (no public lookalike endpoint), Lusha (`/v3/companies/lookalike` requires 5-100 seeds per request, incompatible with the per-seed cell unit of this benchmark).

Under review next: Common Room, Koala, LeadGenius, 6sense, Demandbase.

To request a provider, email founders@openfunnel.dev with a link to the public API docs and pricing page.