Learn · Guide
Brand Pronunciation Resilience — How to Measure If Your Name Survives Spoken Transmission
A name that needs spelling on every sales call is a name that loses every sales call. The acoustic axis is invisible to founders and to every other naming tool we know of. Etymolt is the only verifier that grounds this in a measurement — and the measurement matters because the cost of a low-resilience name compounds across every spoken touchpoint a brand has, forever.
This article walks through the methodology — TTS→Whisper round-trip across 12 accents, Character Error Rate per accent, population-weighted composite — and shows three real names at three score levels. We close with the DIY version, in case you want to run a meaningful resilience reading without our API.
Where pronunciation matters
Founders typically dismiss pronunciation as a soft concern. It is not soft. The specific touchpoints where a low-resilience name bleeds value:
- Sales calls. A prospect who can't say your brand name will avoid using it on the call. The mention rate drops; the recall drops; the close rate drops. The dollar cost compounds per closed-lost opportunity.
- Podcasts. A podcast host who mispronounces the name on episode one will mispronounce it forever. Every listener forms the wrong audio anchor. Search queries fragment between the correct and incorrect form.
- Voice search. “Hey Siri, open Falcata” — if Siri doesn't recognize the phoneme cluster, the user is dropped into a Bing search instead of your app. Voice-search volume is roughly 27% of mobile search in 2026.
- Word-of-mouth marketing. Customer A tells Customer B about your product. Customer B mishears the name. The conversion path is broken at the referral. You will never measure this loss, but it's real.
- Trade-show booths. A name that two people in your sales team pronounce differently telegraphs incoherence. Booth visitors don't know which form is correct and stop asking.
- Investor pitches. Reading a pitch deck aloud is a stress-test of the brand name. A name that the founder hesitates on, every time, lowers investor confidence.
- Employee referrals. Engineers refer friends by saying the company name out loud. A name that's hard to say is harder to recruit under.
The pattern is consistent: any spoken-channel touchpoint is a friction surface for a low-resilience name. The cost is invisible per-instance and material at scale.
The TTS→Whisper round-trip methodology
The measurement is conceptually simple: speak the name in many voices, transcribe it back, and count the errors. The implementation:
- Synthesize the name with ElevenLabs across 12 accents. The accents are calibrated to cover roughly 4.8 billion English speakers and second-language speakers worldwide. The 12: US General American, UK Received Pronunciation, UK Northern, Australian, Indian, Filipino, Nigerian, Brazilian Portuguese, Mexican Spanish, German, Mandarin-accented, and Hindi-accented.
- Transcribe the audio back with OpenAI Whisper. We use Whisper large-v3 for transcription, calibrated against the Common Voice corpus for per-accent baseline accuracy. The transcription is a deliberate stress-test — Whisper is a strong model, so any transcription failure is a meaningful signal about the name itself, not about transcription quality.
- Compute Character Error Rate (CER) per accent. CER is the Levenshtein edit distance between the source spelling and the transcription, normalized by source length. CER of 0.0 means perfect transcription; CER of 0.5 means half the characters are wrong.
- Composite into a 0–100 Pronunciation Resilience score. The composite is a population-weighted mean of (1 - CER) across the 12 accents. The weights track the speaker population of each accent. The composite is the headline number we surface in the verdict.
The methodology is reproducible. Any team with TTS access (ElevenLabs, Cartesia, or the equivalent) and Whisper access (OpenAI API or the open-source weights) can run it. The DIY section below walks through the minimum implementation.
How to read the score
The composite maps to four bands. The cutoffs are calibrated against the naming-failure data we surface in The Five Ways a Brand Name Dies.
- 95–100 — Bulletproof. Sonorant onset, common phoneme set, two syllables or fewer. Survives every accent and most noisy environments. Examples: Tesla, Linear, Nike, Apple.
- 85–94 — Clear. Workable across most accents; minor clips in one or two. Examples: Stripe, Figma, Slack, Notion.
- 70–84 — Workable but flagged. Survives the dominant accents; fails in two or three. The flag is informational — the candidate is viable but will pay an acoustic tax in specific markets. Examples: Falcata (terminal cluster clips), Twilio (vowel cluster ambiguity).
- Under 70 — Dies in voice search. Fails in five or more accents. The candidate cannot anchor a voice-first brand. Examples: novel consonant clusters (“Mxylpx”-class made-up names), compound words with ambiguous segmentation.
The composite is informational, not deterministic. A score of 73 doesn't kill a candidate; it tells you the candidate will pay an acoustic tax. Some brands choose to pay it — Häagen-Dazs is an extreme example of a brand that deliberately picked a hard-to-say name and made the difficulty part of the brand. Most brands shouldn't copy that strategy. Häagen-Dazs is sui generis.
The 12 accents — what tends to fail in each
Each accent fails predictably on different phoneme clusters. The patterns are consistent enough that the scoring engine can attribute failures back to specific clusters in the source name.
- US General American — the calibration baseline. Fails on continental-European rhotics, palatalized consonants, ejectives.
- UK Received Pronunciation — fails on dropped /r/ (non-rhotic); accent neutralization of /ɑː/ vs. /æ/ creates ambiguity in low-vowel names.
- UK Northern — short /a/ where RP has long /ɑː/; flattens front-vowel distinctions.
- Australian — diphthong shifts (/aɪ/ → /ɔɪ/); terminal consonant cluster simplification.
- Indian — retroflex consonants; voicing distinctions on aspirated stops; segmentation ambiguity in compound words.
- Filipino — terminal consonant cluster reduction; /f/ vs. /p/ merger in some idiolects; vowel epenthesis.
- Nigerian — terminal-stop devoicing; tone-language transfer creates stress-pattern ambiguity.
- Brazilian Portuguese — nasalized vowels; palatalized /t/, /d/; terminal-/r/ deletion.
- Mexican Spanish — vowel neutralization on schwas; rolled /r/; terminal-cluster simplification.
- German — final-obstruent devoicing; long-vowel preservation; /v/ vs. /w/ merger.
- Mandarin-accented — tone overlay on stressed syllables; /l/ vs. /r/ confusion; terminal-stop deletion.
- Hindi-accented — retroflex consonants; aspirated stops; vowel epenthesis on terminal clusters.
Three worked examples
Linear — 99/100 (Bulletproof)
Two-syllable Latinate root, sonorant /l/ onset, common phoneme set. The CER per accent: US 0.00, UK 0.00, AU 0.00, IN 0.01, FIL 0.01, NIG 0.00, BRPT 0.02, MXES 0.00, DE 0.00, ZH 0.03, HI 0.01, UK-N 0.00. Population-weighted composite: 99%. The name survives every accent and every noisy environment. It is the benchmark we use for what a fully resilient name looks like.
Falcata — 84/100 (Workable but flagged)
Three-syllable Spanish/Latin root with terminal /-ta/ cluster. The cluster clips in noisy environments. CER per accent: US 0.04, UK 0.05, AU 0.08, IN 0.18, FIL 0.21, NIG 0.11, BRPT 0.06, MXES 0.03, DE 0.09, ZH 0.22, HI 0.19, UK-N 0.06. The Indian, Filipino, Mandarin-accented, and Hindi-accented round-trips fail the terminal /-ta/ roughly 20% of the time — “Falcata” transcribes as “falca” or “falcate” or “falcat.” The composite of 84% is workable but telegraphs an acoustic tax for any brand targeting South Asian or East Asian markets. The full case study is at /case-studies/falcata-due-diligence-71.
Coldbrew — 75/100 (Under 80, flagged)
Two-syllable compound noun with ambiguous segmentation. The compound “Cold + Brew” splits into two tokens in roughly a quarter of the round-trips, particularly in Indian and Filipino accents. CER per accent: US 0.05, UK 0.07, AU 0.09, IN 0.31, FIL 0.34, NIG 0.18, BRPT 0.16, MXES 0.10, DE 0.12, ZH 0.28, HI 0.30, UK-N 0.08. The composite is 75%. The case study is at /case-studies/coldbrew-abandon-28 — Coldbrew fails on multiple axes, but the pronunciation failure on its own would justify a flag.
The acoustic axis is the only one founders consistently miss. Trademark, domain, and handle are obvious failure modes. Cultural fits the “I should check that” mental category. Pronunciation lives in nobody's mental category — which is why we see it kill 7% of names and why the founders who lose to it are the most surprised.
The DIY version
You can run a meaningful resilience check without our API. The minimum implementation:
- Pick a TTS service. ElevenLabs has the broadest accent library and is the cleanest API for this; Cartesia, OpenAI's TTS, and Google Cloud TTS all work. Synthesize the name in at least four voices: US General American, UK RP, an Indian accent, and a Mandarin accent. These four cover roughly 75% of the worldwide English-speaking signal.
- Save the audio. WAV or MP3 is fine. Sample rate of 16kHz or higher.
- Run Whisper on each clip. The OpenAI API call is
client.audio.transcriptions.create(model="whisper-1", file=audio_file). The open-source weights work too; the Python packageopenai-whisperruns locally on a GPU. - Compute the edit distance. Python's
Levenshtein.distance(source, transcription)returns the character edit distance. Divide by source length to get CER. - Average and read. Mean CER across the four accents. (1 - mean CER) × 100 gives you a rough composite. Anything under 0.10 mean CER (composite 90+) is resilient. Anything over 0.20 (composite under 80) is flagged.
The four-accent DIY check takes about twenty minutes to set up and three minutes per candidate. The Etymolt API runs the 12-accent version in roughly six seconds and returns the per-accent breakdown. If you're testing more than a handful of candidates, the API path is faster; if you're testing one, the DIY path is perfectly viable.
The API path
Etymolt's POST /v1/verify endpoint runs the 12-accent pronunciation pipeline as one of five axes. The pronunciation block returns a composite score, per-accent CER, and the specific phoneme clusters that drove the failures. The full method is documented at /voice; the methodology page at /methodology covers the calibration data and the recalibration cadence.
Clearance signal, not legal advice. The pronunciation axis returns evidence, not a verdict on its own. The verdict is composited across all five axes; pronunciation is one input.
Related reading
- Sound Symbolism for Brand Naming — the distinct acoustic axis: how the name feels, not how it survives transmission.
- The Five Ways a Brand Name Dies — where pronunciation sits relative to trademark, domain, handle, and culture.
- Methodology — the corpus, the calibration weights, the recalibration cadence.
Take the next step
Hear your name. Across 12 accents.
Etymolt synthesizes your candidate in 12 accents, transcribes it back, and scores the survivability in under six seconds. Five free clearances per IP.