From: Christopher Fynn (cfynn@gmx.net)
Date: Sun Nov 13 2005 - 22:52:17 CST
Mark Davis wrote:
> Logically speaking, the set of characters used by a language is a quite
> fuzzy, so there isn't really a black and white answer (see also
> http://www.unicode.org/draft/reports/tr36/tr36.html#Language_Based_Security).
>
>
> What we ended up doing in CLDR was having a core set of characters for a
> language (the 'exemplarCharacters'), plus an additional set of
> characters that would be seen in customary usage. For example, for
> English we have [a-z] in the main set, and [á à ă â å ä ā æ ç é è ĕ ê ë
> ē í ì ĭ î ï ī ñ ó ò ŏ ô ö ø ō œ ß ú ù ŭ û ü ū ÿ] in the auxiliary set.
> (http://unicode.org/cldr/data/common/main/en.xml)
Mark
Should the "exemplar characters" for a language include all the
base+combining character *combinations* frequent in that language
or - all the base characters and all the combining characters listed
separately?
- Chris
> For the language in question, the latter is derived from dictionaries
> and style guidelines for major publications in the language. We don't
> have this in place for all languages yet, but will be expanding coverage
> in the CLDR 1.4 release, so feedback is welcome.
>
> Mark
>
This archive was generated by hypermail 2.1.5 : Sun Nov 13 2005 - 22:54:25 CST