Re: lists of actual character/diacritic combinations

From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Feb 29 2000 - 15:03:33 EST


John Cowan answered:

>
> Joan Aliprand wrote:
>
> > Chris Pratley asked:
> > >Does anyone have a list of combinations of character combining
> > >diacritic(s) that actually occur in use in the world's writing
> > >systems? I'm curious as to which are the most common, which are
> > >never found, etc.
> >
> > For Latin script, take a look at ANSI/NISO Z39.47-1993, Extended
> > Latin Alphabet Coded Character Set for Bibliographic Use (ANSEL),
> > available from the National Information Standards Organization
> > (NISO), www.niso.org.
>
> http://www.ccil.org/~cowan/elsie/elsie.html contains just such a list,
> founded on LoC data supplied by James Agenbroad.
>

Keep in mind that the elsie list is a *corpus* count, and as such,
reflects the content biases of that corpus. Your mileage may differ,
depending on the orthographic tradition(s) and language area of your
data.

For example, U+0313 COMBINING COMMA ABOVE (or U+0315 COMBINING COMMA ABOVE RIGHT)
are relatively infrequent in the corpus, but for many North American Indian
language orthographies these may be the *most* common diacritic occurring
on Latin letters.

--Ken



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT