Ground-truth corpus · v3
naam-grounded · 24,408 rows
v3 corpus draws on live USPTO + TTAB + UKIPO registrations, RDAP records for the apex of every candidate, 14-platform handle probes, the naam cultural-screen corpus (7 languages, 20 markets), and the Ćwiek 2022 + bouba/kiki phonetic-perception dataset. Each row carries its provenance, the date observed, and the assertion class. The corpus version ships with every verdict in the measurement_methodology field per R12 D-R12-19.
CC-BY-4.0 future release · Q3 2026 publication target