Michael Everson surmised:
> Ar 14:06 -0800 2000-02-28, scríobh Chris Pratley:
> >Does anyone have a list of combinations of character + combining
> >diacritic(s) that actually occur in use in the world's writing
> >systems?
> >m curious as to which are the most common, which are never found, etc.
>
> I suspect you'll find that acutes and circumflexes top the list. Diaereses
> and tildes next. "Never founds" never occur.... :-) There's always somebody
> who'll use something....
>
I took the data file that John Cowan has posted at:
http://www.ccil.org/~cowan/elsie/elsie.html
and did the parsing and counting. The raw figures are posted below.
These constitute the lumped sums from both the MUMS Books database and
the JACKPHY database, containing 12,421,528 instances of characters with
diacritics, out of a total of 1,492,948,727 Latin characters.
As I noted before, take this with a grain of salt. This is merely a
corpus count, and the frequencies depend entirely on what is included
in that corpus.
--Ken
3320174 : 0304 macron
2761619 : 0301 acute (NOTE: 38 tokens are probably for double acute)
1529686 : 0308 diaeresis
1033167 : 0306 breve
893998 : 0323 dot below
691920 : FE20 ligature left half
691829 : FE21 ligature right half (NOTE: some miscoding in data implied)
252196 : 0300 grave
229524 : 030C caron
224048 : 0303 tilde
184104 : 0307 dot above
161285 : 0327 cedilla
140016 : 0302 circumflex
79076 : 0326 comma below
77278 : 0331 macron below
53663 : 030A ring above
39912 : 0328 ogonek
32388 : 031C left half ring below (NOTE: probably mostly intended for ogonek)
31537 : 030B double acute
22960 : 0325 ring below
9067 : 0324 diaeresis below
7087 : 0309 hook above
5949 : 0310 candrabindu
4430 : 0315 comma above right
1220 : 0333 double low line
696 : 0313 comma above
172 : 032E breve below
142 : FE22 double tilde left half
85 : FE23 double tilde right half
And in case anyone is interested in *what* the diacritics get applied to,
here are the raw figures for the frequency of base characters:
2594522 : 0061 a
2391847 : 006F o
1792835 : 0069 i
1594686 : 0065 e
1486937 : 0075 u
407268 : 0073 s
331776 : 0074 t
294089 : 006E n
254005 : 0063 c
201077 : 0068 h
90764 : 006B k
82214 : 0053 S
74224 : 0049 I
70229 : 0041 A
64821 : 0045 E
62795 : 007A z
62553 : 0072 r
58625 : 004F O
56210 : 0055 U
55316 : 006D m
52729 : 0048 H
48828 : 0076 v
47061 : 0054 T
43585 : 0064 d
41736 : 0079 y
36920 : 006C l
24252 : 0043 C
20431 : 004B K
17107 : 0067 g
11100 : 01B0 u-hook
9727 : 00E6 ae
8127 : 005A Z
7767 : 0056 V
7218 : 0044 D
4621 : 0153 oe
2995 : 01A1 o-hook
2121 : 0052 R
1897 : 0131 dotless-i
1735 : 004E N
1676 : 0047 G
545 : 004C L
528 : 0077 w
497 : 0046 F
363 : 0062 b
252 : 0070 p
199 : 006A j
165 : 0066 f
153 : 0071 q
95 : 004A J
93 : 0042 B
66 : 0059 Y
45 : 004D M
31 : 0078 x
22 : 0050 P
21 : 01AF U-hook
17 : 00C6 AE
14 : 0051 Q
11 : 01A0 O-hook
8 : 0057 W
4 : 0058 X
3 : 0152 OE
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT