GOLEM keyboard project

Hungarian letter frequency

Wether you want to resolve an encrypted text in Hungarian or optimize your custom logical keyboard layout, you'll need data on contemporary letter frequency. Here is a table of letter frequencies based on 12.5 million characters of 5000 articles published by Hungary's most visited economical news portal during the summer of 2018.

#Char%RecurrenceOccurance
1 12.841%81605594
2E8.312%121039346
3A7.512%13939339
4T6.401%16800371
5L5.202%19650425
6S4.890%20611457
7N4.511%22564011
8K4.139%24517544
9I3.724%27465704
10Z3.660%27457587
11R3.616%28452168
12O3.220%31402625
13á2.933%34366708
14é2.838%35354878
15G2.544%39318148
16M2.524%40315563
17B1.845%54230707
18D1.680%60210121
19Y1.616%62202061
20V1.416%71177002
21,1.198%83149835
22P1.096%91137045
23H1.093%92136628
24J0.921%109115145
25ö0.872%115109062
26U0.853%117106685
27F0.851%117106461
28ó0.798%12599757
29ő0.770%13096222
30ENTER0.750%13393783
31.0.697%14387138
32C0.644%15580530
33í0.495%20261922
34ü0.476%21059544
3500.463%21657903
3610.346%28943281
37-0.320%31339969
3820.293%34236599
39ú0.223%44927856
4080.152%65819017
4150.142%70317782
4230.136%73417033
4340.112%89613960
44ű0.101%98712665
45:0.097%1,02712177
4670.092%1,09111458
4760.084%1,19210494
4890.082%1,21810264
49X0.080%1,2579951
50)0.054%1,8626714
51(0.053%1,8736676
52W0.050%2,0166203
53%0.044%2,2575539
54"0.043%2,3085418
55?0.021%4,6662680
56&0.018%5,5082270
57/0.011%8,7261433
58;0.011%8,9631395
59Q0.010%9,8691267
60!0.009%11,4821089
61'0.005%18,442678
62+0.002%43,266289
63#0.002%48,278259
64|0.002%53,896232
65@0.001%173,66572
66*0.000%694,66018
67×0.000%781,49316
68_0.000%833,59215
69^0.000%961,83713
70~0.000%1,136,71711
71°0.000%1,136,71711
72§0.000%1,136,71711
73=0.000%1,389,3209
74$0.000%1,562,9858
75[0.000%2,083,9806
76]0.000%2,083,9806
77š0.000%2,500,7765
78ä0.000%3,125,9714
79ë0.000%6,251,9412
80ç0.000%6,251,9412
81>0.000%6,251,9412
82´0.000%12,503,8821
83ł0.000%12,503,8821
84ô0.000%12,503,8821
85č0.000%12,503,8821

I regularly see letter frequencies calculated from books, especially old fiction novels. I don't think this makes too much sense except you write in the style of the specific author. The statistics above are based on contemporary texts. See this blog entry for details on data collection method and for steps of processing.

While these numbers pretty much represent the typed characters if you are an economic journalist, evaluation of other texts will result in slightly different numbers. To use the most appropriate data for logical keyboard layout design, you'll have to compile similar statistics based on your personal typing history (emails, tweets, essays, publications, personal diary etc.).