=== Language Detection Comparison Report ===

Test sentences: 1,446,486 total  (264,096 short <= 50 chars, 1,182,390 full)
Shared languages: 123  (1,146,734 sentences)

Model heap (approx):
  Bigram:  ~2.0 MB
  OpenNLP: ~79.1 MB

Strict accuracy (exact language match):
Detector               All       Short        Full      Time(ms)      Sent/sec
----------------------------------------------------------------------------------
Bigram             94.81%     92.96%     95.22%       2,932 ms     493,344
OpenNLP            71.10%     64.22%     72.63%      11,180 ms     129,382

Group accuracy (confusable languages counted as correct):
  Groups: {nob/nno/nor/dan}, {hrv/srp/bos/hbs}, {msa/zlm/zsm/ind}, {pes/prs/fas}, {zho/cmn/wuu/yue}, {aze/azj}, {ekk/est}, {lvs/lav}, {plt/mlg}, {khk/mon}, {ydd/yid}, {sme/smi}, {sqi/als}, {tat/bak}, {ita/vec}, {spa/arg/ast}
Detector               All       Short        Full
--------------------------------------------------
Bigram             97.15%     95.25%     97.57%
OpenNLP            76.05%     70.26%     77.35%

Shared-language accuracy (123 languages both models support):
Detector               All       Short        Full      Time(ms)      Sent/sec
----------------------------------------------------------------------------------
Bigram             96.00%     94.20%     96.38%       2,249 ms     509,886
OpenNLP            89.68%     86.23%     90.39%       8,508 ms     134,783

Bigram timing breakdown (all sentences, CPU-time summed across threads):
  Preprocess (NFC/URL/truncate): 6,187 ms (19%)
  Feature extraction (bigrams):  8,188 ms (25%)
  Model prediction (softmax):    18,536 ms (56%)
  CPU total:                     32,911 ms
  Wall-clock total:              2,932 ms

Per-language accuracy (all sentences):
Language        Bigram  Bi-Grp%  Bigram%   OpenNLP  ON-Grp% OpenNLP%
----------------------------------------------------------------------------------
afr           9806/9988          98.2%  9725/9988          97.4%
amh           5740/5753          99.8%  5685/5753          98.8%
ara           9873/10000          98.7%  9957/10000          99.6%
arz           8238/8508          96.8%     0/8508           0.0%
asm           7629/7759          98.3%  7660/7759          98.7%
aze           19874/19984   99.4%   99.4%  19643/19984   98.3%   98.3% *
bak           5215/9626   98.9%   54.2%  3226/9626   97.8%   33.5% *
bam           1078/1088          99.1%     0/1088           0.0%
ban           2948/4310          68.4%  2534/4310          58.8%
bar           7934/8112          97.8%     0/8112           0.0%
bcl           1750/1852          94.5%     0/1852           0.0%
bel           9962/9975          99.9%  9903/9975          99.3%
ben           9941/10001          99.4%  9916/10001          99.2%
bjn           1585/2387          66.4%     0/2387           0.0%
bos           4571/9950   97.3%   45.9%  5016/9950   95.0%   50.4% *
bpy           7131/7315          97.5%     0/7315           0.0%
bre           8440/8525          99.0%  8132/8525          95.4%
bua           1417/1465          96.7%     0/1465           0.0%
bul           9895/10001          98.9%  9185/10001          91.8%
cat           9898/10001          99.0%  9503/10001          95.0%
ceb           9961/9996          99.6%  9799/9996          98.0%
ces           9870/10001          98.7%  9537/10001          95.4%
che           8338/8378          99.5%  8102/8378          96.7%
chv           7352/7443          98.8%     0/7443           0.0%
ckb           7900/8462          93.4%   105/8462           1.2%
cos           1828/1968          92.9%     0/1968           0.0%
csb           1404/1462          96.0%     0/1462           0.0%
cym           9058/9116          99.4%  8813/9116          96.7%
dan           9530/10000   99.5%   95.3%  9326/10000   98.5%   93.3% *
deu           9792/10000          97.9%  9539/10000          95.4%
diq           5529/5604          98.7%     0/5604           0.0%
div           9914/9917         100.0%  9911/9917          99.9%
dsb            930/1000          93.0%     0/1000           0.0%
ell           9998/10000         100.0%  9995/10000         100.0%
eml           3387/3473          97.5%     0/3473           0.0%
eng           9840/10001          98.4%  9294/10001          92.9%
epo           9895/9966          99.3%  9687/9966          97.2%
est           19550/19660   99.4%   99.4%  12751/19660   96.5%   64.9% *
eus           9747/9838          99.1%  9480/9838          96.4%
ewe           1013/1018          99.5%     0/1018           0.0%
ext           1469/1561          94.1%     0/1561           0.0%
fao           9762/9815          99.5%  9537/9815          97.2%
fas           19297/20001   98.9%   96.5%  7027/20001   98.8%   35.1% *
fin           9967/10001          99.7%  9893/10001          98.9%
fra           9840/9998          98.4%  8963/9998          89.6%
frr           1600/1680          95.2%     0/1680           0.0%
fry           9145/9633          94.9%  8772/9633          91.1%
gle           9508/9529          99.8%  9319/9529          97.8%
glg           9312/9521          97.8%  9077/9521          95.3%
glv           2033/2057          98.8%     0/2057           0.0%
gom           5884/5926          99.3%  5495/5926          92.7%
grn           2315/2399          96.5%     0/2399           0.0%
gsw           8674/9170          94.6%  7083/9170          77.2%
guj           9660/9666          99.9%  9648/9666          99.8%
hat           6392/6447          99.1%  6218/6447          96.4%
hau           7416/7463          99.4%  7357/7463          98.6%
hbs           3176/9998   98.1%   31.8%     0/9998   94.9%    0.0% *
heb           9986/10001          99.9%  9960/10001          99.6%
hif           2135/2199          97.1%     0/2199           0.0%
hin           9694/10001          96.9%  9884/10001          98.8%
hrv           7384/10001   98.7%   73.8%  8131/10001   97.2%   81.3% *
hsb           4304/4355          98.8%     0/4355           0.0%
hun           9979/10000          99.8%  9921/10000          99.2%
hye           9955/9960          99.9%  9948/9960          99.9%
ibo           1621/1653          98.1%  1521/1653          92.0%
ido           7599/7772          97.8%     0/7772           0.0%
ile            902/1000          90.2%     0/1000           0.0%
ilo           2815/2843          99.0%     0/2843           0.0%
ina           4369/4540          96.2%     0/4540           0.0%
ind           5416/9997   85.9%   54.2%  2670/9997   64.6%   26.7% *
isl           9927/10001          99.3%  9780/10001          97.8%
ita           9841/10001   98.4%   98.4%  9489/10001   94.9%   94.9% *
jav           7962/9095          87.5%  6639/9095          73.0%
jpn           9924/10000          99.2%  9801/10000          98.0%
kal           6440/6482          99.4%     0/6482           0.0%
kan           9524/9528         100.0%  9519/9528          99.9%
kat           9981/9998          99.8%  9995/9998         100.0%
kaz           9879/9902          99.8%  9827/9902          99.2%
khk            820/1960   99.7%   41.8%     0/1960   98.8%    0.0% *
kin           4265/4486          95.1%  4440/4486          99.0%
kir           9843/9887          99.6%  9608/9887          97.2%
koi           1013/1148          88.2%     0/1148           0.0%
kom           1713/1850          92.6%     0/1850           0.0%
kor           9994/10000          99.9%  9989/10000          99.9%
krc           1114/1142          97.5%     0/1142           0.0%
ksh           2136/2245          95.1%     0/2245           0.0%
kur           5107/6671          76.6%   232/6671           3.5%
lad            944/1000          94.4%     0/1000           0.0%
lao           1103/1110          99.4%  1110/1110         100.0%
lat           8677/8891          97.6%  8215/8891          92.4%
lav           19237/19282   99.8%   99.8%  10386/19282   98.7%   53.9% *
lim           8226/9169          89.7%  5099/9169          55.6%
lit           9961/10000          99.6%  9795/10000          98.0%
lmo           6139/6401          95.9%     0/6401           0.0%
ltz           9824/9937          98.9%  9171/9937          92.3%
lug           9355/9378          99.8%  9116/9378          97.2%
lup           1215/1226          99.1%     0/1226           0.0%
lus           8516/8575          99.3%     0/8575           0.0%
mai            906/1000          90.6%     0/1000           0.0%
mal           9645/9648         100.0%  9602/9648          99.5%
mar           9704/10000          97.0%  9741/10000          97.4%
mhr           3838/3880          98.9%  3611/3880          93.1%
min           8372/8567          97.7%  8213/8567          95.9%
mkd           9880/9978          99.0%  9878/9978          99.0%
mlg           16806/16843   99.8%   99.8%  16721/16843   99.3%   99.3% *
mlt           9205/9237          99.7%  9022/9237          97.7%
mon           8688/8945   99.6%   97.1%  8845/8945   98.9%   98.9% *
mri           8426/8473          99.4%  8372/8473          98.8%
mrj           1750/1790          97.8%     0/1790           0.0%
msa           18169/19615   97.4%   92.6%  16290/19615   87.0%   83.0% *
mwl           6019/6125          98.3%     0/6125           0.0%
myv            973/1000          97.3%     0/1000           0.0%
mzn           5264/5565          94.6%     0/5565           0.0%
nan           9135/9156          99.8%  8904/9156          97.2%
nap            943/1000          94.3%     0/1000           0.0%
nav            999/1000          99.9%     0/1000           0.0%
ndo           1345/1350          99.6%     0/1350           0.0%
nds           8373/8756          95.6%  7829/8756          89.4%
nep           9692/10000          96.9%  9682/10000          96.8%
new           7092/7369          96.2%  6928/7369          94.0%
nld           9504/9998          95.1%  8778/9998          87.8%
nno           9334/9984   99.3%   93.5%  8634/9984   98.3%   86.5% *
nob           19270/19929   99.3%   96.7%  17874/19929   96.8%   89.7% *
nso           3422/3506          97.6%  3321/3506          94.7%
ori           7963/7993          99.6%  7939/7993          99.3%
orm           1135/1157          98.1%  1117/1157          96.5%
oss           2587/2599          99.5%     0/2599           0.0%
pam           2006/2083          96.3%     0/2083           0.0%
pan           9439/9442         100.0%  9433/9442          99.9%
pap           8504/8590          99.0%     0/8590           0.0%
pfl           1758/1885          93.3%     0/1885           0.0%
pms           5706/5815          98.1%     0/5815           0.0%
pnb           9053/9336          97.0%  7256/9336          77.7%
pol           9967/9999          99.7%  9803/9999          98.0%
por           9840/10000          98.4%  9467/10000          94.7%
prs           4880/7150   80.5%   68.3%     0/7150   30.4%    0.0% *
pus           6321/9166          69.0%  8024/9166          87.5%
que           2720/2849          95.5%     0/2849           0.0%
roh           8985/9094          98.8%  8697/9094          95.6%
ron           9914/10000          99.1%  9818/10000          98.2%
rue           1522/2000          76.1%     0/2000           0.0%
run           1620/1737          93.3%     0/1737           0.0%
rus           9897/10000          99.0%  8718/10000          87.2%
sah           7482/7574          98.8%     0/7574           0.0%
san           8253/8453          97.6%  6475/8453          76.6%
scn           4610/4748          97.1%     0/4748           0.0%
sco           7276/7761          93.8%     0/7761           0.0%
sgs           2294/2346          97.8%     0/2346           0.0%
sin           9046/9049         100.0%  9031/9049          99.8%
slk           9875/10001          98.7%  9546/10001          95.5%
slv           9870/10001          98.7%  9793/10001          97.9%
sme           2817/2940   98.8%   95.8%     0/2940    0.0%    0.0% *
smi            226/1000   99.1%   22.6%     0/1000    0.0%    0.0% *
sna           8760/8782          99.7%     0/8782           0.0%
snd           8177/8226          99.4%  8124/8226          98.8%
som           8646/8686          99.5%  8603/8686          99.0%
sot           2277/2319          98.2%     0/2319           0.0%
spa           9848/10000   98.5%   98.5%  9887/10000   98.9%   98.9% *
sqi           9980/10000   99.8%   99.8%  9928/10000   99.3%   99.3% *
srd           2082/2182          95.4%  1568/2182          71.9%
srp           9382/10000   98.6%   93.8%  9597/10000   96.0%   96.0% *
ssw           1008/1035          97.4%   829/1035          80.1%
sun           2937/9269          31.7%     0/9269           0.0%
swe           9879/10001          98.8%  9296/10001          93.0%
swh           2958/3000          98.6%     0/3000           0.0%
szl           3974/4037          98.4%     0/4037           0.0%
tam           9942/9948          99.9%  9929/9948          99.8%
tat           8547/9954   99.7%   85.9%  9801/9954   99.2%   98.5% *
tel           9878/9880         100.0%  9868/9880          99.9%
tgk           9942/9973          99.7%  9650/9973          96.8%
tgl           9619/9690          99.3%  9243/9690          95.4%
tha           9664/9691          99.7%  9521/9691          98.2%
tsn           4137/4254          97.2%  3870/4254          91.0%
tso           3312/3332          99.4%     0/3332           0.0%
tuk           8594/8609          99.8%  8551/8609          99.3%
tur           9919/10000          99.2%  9741/10000          97.4%
tyv           1673/1704          98.2%     0/1704           0.0%
udm            948/1000          94.8%     0/1000           0.0%
uig           6250/6253         100.0%  6231/6253          99.6%
ukr           9918/10001          99.2%  9721/10001          97.2%
urd           9917/9998          99.2%  9621/9998          96.2%
uzb           7499/9641          77.8%  7754/9641          80.4%
uzn           9497/9980          95.2%     0/9980           0.0%
ven           1782/1796          99.2%     0/1796           0.0%
vie           9990/10000          99.9%  9961/10000          99.6%
vls           4745/4992          95.1%     0/4992           0.0%
vol           8372/8393          99.7%  8372/8393          99.7%
vro           1510/1601          94.3%     0/1601           0.0%
war           9398/9490          99.0%  9335/9490          98.4%
wln           5564/5645          98.6%     0/5645           0.0%
wuu           6738/7978   98.7%   84.5%     0/7978   47.7%    0.0% *
xho           6932/8185          84.7%  5919/8185          72.3%
xmf           5293/5337          99.2%     0/5337           0.0%
ydd            523/2183  100.0%   24.0%     0/2183  100.0%    0.0% *
yid           5658/6399   99.4%   88.4%  6359/6399   99.4%   99.4% *
yor           2287/2321          98.5%  1930/2321          83.2%
zea           1406/1793          78.4%     0/1793           0.0%
zho           19016/19969   99.1%   95.2%     0/19969   22.8%    0.0% *
zul           8630/9022          95.7%  8756/9022          97.1%

* = member of a confusable group; Grp% = group accuracy
