From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Nov 14 2005 - 13:09:26 CST
From: "Mark Davis" <mark.davis@icu-project.org>
> We use NFC for the exemplar character set. Any significant character
> sequence can be included as well. For example, one can have [a-h {ch}
> i-z], which indicates that {ch} is treated as a unit.
What about apostrophes? They are present in some sigraphs and trigraphs for
some languages.
For example, Breton has {ch} and {c'h} but no isolated {c}. One problem is:
howcan we represent the various ways to encode the apostrophe: {c'h} is
frequent for Breton (using the ASCII single quote), but the correct code
should be {c’h} (using the upper-comma apostrophe).
How can we say that these two should be treated equivalently. May be this
one {c['’]h} ???
Although the correct form should be with the apostrophe, the ASCII quote is
MUCH more frequent (also true for French and English, however the apostrophe
plays another role and they don't create unbreakable digraphs/trigraphs like
in Breton where the three characterssequence is a SINGLE letter...)
There are similar examples in other languages, like {’n} or {'n} for which
there also exists a combined character in Unicode...
Also in Greek, {’Ε} is interpreted as capital Epsilon with tonos which can
also be found encoded as a spacing tonos character before the capital letter
{΄Ε} or often with an apostrophe or single quote followed by Epsilon...
Philippe.
This archive was generated by hypermail 2.1.5 : Mon Nov 14 2005 - 19:13:35 CST