MC> Consonants [j] and [w] have the special status of "semivowels" in
MC> romance languages, which means that they often behave as vowels
MC> do, including in the rules for elision.
One has to differentiate between phonemes and graphemes. Unicode, of
course, operates on the grapheme level, and thus you simply can't be
certain what a "y" actually stands for (vowel or semivowel)
MC> But, of course, I am aware that there are edge cases that will not
MC> be captured in the general case. I have named one of these edge
MC> cases (the Breton trigraph "c'h"), but it's not difficult to come
MC> up with more -- e.g., when the apostrophe is used as a diacritic
MC> applied to consonants (such as the Wade-Giles romanization of
MC> Chinese "K'ang-hsi").
Just to give another example: Uzbek in Latin script uses "o'" and "g'"
as opposed to "o" and "g", such as in the language designation
"O'zbek" where "o'" stands for the sound designated in Cyrillic script
by U+040E and "g'" is equivalent to U+0493.
MC> BTW, notice that I didn't include precomposed accented letters
MC> because I understand UTR#29 works on NFD normalized text.
Does NFD in this instance mean to include U+0080..00FF, i.e. the
former Latin-1 upper block? It would be of interest to us Germans :-)
MC> However, "ItalianFrenchVowel" doesn't include Esperanto, Occitan
MC> and many Italian and French dialects.
"RomanceVowel"? (Not a lot better.)
Philipp
This archive was generated by hypermail 2.1.2 : Wed Aug 14 2002 - 20:23:18 EDT