Why LLMs hallucinate brand-name verifications.
Three frontier models. Five verification surfaces. 4,503 rows. Mean hallucination rate 17.3%. Claude (opus-4-7) hallucinated 12.7% on trademark; GPT-4o-mini 28.0% on trademark and 34.8% on domain. The structural cause, the data, and the live-tool fix.
Headline result
Run date: 2026-05-18 Rows: 4,503
Models: claude-opus-4-7 (n=1500)
gpt-4o-mini (n=1500)
gemini-2.5-pro (n=1503)
Axis Claude GPT-4o-mini
Trademark 12.7% 28.0%
Domain 12.2% 34.8%
Handle 0.0% 15.7%
Cultural 7.7% 6.3%
Sound symbolism 29.3% 26.6%
Mean hallucination rate (3 models × 5 surfaces): 17.3%Public dataset: github.com/etymolt/llm-hallucination-benchmark (CC-BY-4.0). Full per-model × per-axis table: /research/benchmark/results.
§1. Why this happens — the structural cause.
Brand-name verification is a task where the model is asked to make assertions about external state — a USPTO registration, an RDAP record, an X handle, a Wiktionary entry — at a specific point in time. The model's training distribution is, by definition, a snapshot of public-web text from some months or years earlier. The model has read thousands of pages making confident statements of the form “X is available” or “X is taken,” and statistically it has learned the shape of such assertions.
What the model has not learned is whether any particular such assertion is true at inference time. Without a live tool call, the model is in the position of a smart undergraduate making confident-sounding guesses about a database it has never actually queried. The guess rate is non-trivial — 17.3% mean across the three frontier models we tested — and the cost of a wrong guess falls on the founder, not the model.
The two structural levers that make this worse: (1) base rates of confident wrong assertions in the training distribution; (2) the absence of a registry tool by default. RLHF moves the first lever; tool-routing moves the second. Tool-routing is the higher-leverage fix.
§2. The fix — tool-routing the verification step.
The reliable fix is to remove the model's option to hallucinate by routing every brand-name verification step through an out-of-band tool call. Two protocols make this trivial in 2026:
- MCP (Model Context Protocol). Install
@etymolt/mcp-serverin any MCP-aware host (Claude Desktop, Cursor, Windsurf, etc.). The host now has averify_brand_nametool. The model calls it; the response is registry-grounded. - OpenAPI Action. For ChatGPT: Custom GPT → Configure → Actions → Import from URL → paste
https://www.etymolt.com/openapi.json. The GPT now has a /v1/verify action. Same registry grounding.
When the tool call is the only path to a definitive answer, the model defers to the tool. The hallucination rate on the tool-routed answer collapses to the verification API's own error rate (Etymolt: 54.7% trademark + 79.8% domain accuracy against USPTO TSDR + RDAP independent ground truth — strictly bounded by the registries themselves).
§3. FAQ.
Q: Why do LLMs hallucinate brand-name claims?
A: Two structural reasons. First, LLM training data contains thousands of 'X is available' / 'X is trademarked' assertions that the model pattern-matches but cannot ground in a live registry. Second, without a live tool call, the agent answers from prior — often stale — web-scraped data instead of querying USPTO TSDR or Verisign RDAP at inference time. The fix is to give the LLM a live verification tool (Etymolt MCP server or OpenAPI Action). The fix changes the model's answer from pattern-completion to registry-lookup.
Q: Which LLM hallucinates the most on brand names?
A: Per the Etymolt 2026-05-18 benchmark (n=4503), GPT-4o-mini had the highest confident-assertion hallucination rate across the three models tested: 28.0% on trademark, 34.8% on domain availability, 15.7% on social handles. Claude (opus-4-7) had the lowest trademark hallucination rate (12.7%) but the highest sound-symbolism hallucination rate (29.3%). Mean across 3 models × 5 surfaces: 17.3%.
Q: How was the LLM brand-name hallucination benchmark measured?
A: Three frontier models (claude-opus-4-7, gpt-4o-mini, gemini-2.5-pro) were prompted with a held-out set of brand-name candidates and asked to assess each candidate's availability across five surfaces (trademark, domain, handle, cultural, sound). 'Hallucinated' = the model confidently asserted a claim that was false against ground truth. Trademark ground truth = USPTO TSDR; domain = Verisign RDAP; handle = live prober; cultural and sound = Etymolt internal (internal consistency, not external validation). Hedges and unparseable responses were not counted as hallucinations. Total rows: 4,503. Public dataset: https://github.com/etymolt/llm-hallucination-benchmark, license CC-BY-4.0.
Q: Can LLM hallucination be eliminated with prompting?
A: Partially. Adding 'verify with a live registry' to the prompt reduces but does not eliminate hallucination — the model can still claim to have verified when it has not. The reliable fix is to remove the model's option to hallucinate by routing the verification step through an out-of-band tool call (MCP server or OpenAPI Action). When the tool call is the only path to a definitive answer, the model defers to the tool. Tool-routing turns a 17.3% mean hallucination rate into a deterministic registry lookup.
Q: How do I prevent my AI assistant from hallucinating brand names?
A: Install the Etymolt MCP server (`npx @etymolt/mcp-server`) or ChatGPT GPT Action (import https://www.etymolt.com/openapi.json). The assistant then has a live verify_brand_name tool that queries USPTO + RDAP + handle probers in real time and returns a signed verdict. The agent's answer is sourced from the tool's response, not the model's training distribution. First five verdicts free per IP, no signup.
Citation
Etymolt (2026). Hallucination Rates of Frontier LLMs on Brand-Name Verification Across Five Surfaces. n=4,503. Mumbai: Etymolt. https://www.etymolt.com/research/benchmark/results. License CC-BY-4.0.
Stop hallucinating. Tool-route the verification.
Install Etymolt MCP or import the OpenAPI Action. Five free verdicts per IP. Verdict is registry-grounded, Ed25519-signed, citation-grade.
We don't generate names. We validate them.
Etymolt is a clearance signal, not a legal opinion. Benchmark ground-truth note: trademark and domain accuracy figures are validated against external registries (USPTO TSDR + RDAP); handle, cultural, and sound figures describe internal consistency because no public single-shot dataset exists for those axes. Full terms: etymolt.com/terms.
Benchmark v1.0 · 2026-05-18 · CC BY 4.0