From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Thu Mar 22 2007 - 20:26:21 CST
Eric Muller wrote on Wednesday, March 21, 2007 3:17 PM
Subject: Re: Comment on PRI 98: IVD Adobe-Japan1 (pt.2)
> The case of the pronunciation variants is a bit more delicate. With
> today's understanding of what character encoding is about, I think it's
> fair to say that accommodating pronunciation variants in plain text is a
> non-goal, and in fact a misguided effort, in any character standard. Can
> you imagine having two coded characters for each ideograph used in Japan,
> one for On reading and one for Kun reading?
But don't we already have something like that for Welsh and Slovak? The
lower case Welsh letter 'ng', which represents a velar nasal, is encoded as
<U+006E LATIN SMALL LETTER N, U+0067 LATIN SMALL LETTER G> (e.g. Angharad),
while the 'coincidental' occurrence of a nasal and a voiced velar stop
should be encoded as <U+006E, U+034F COMBINING GRAPHEME JOINER, U+0067>
(e.g. Bangor and Llangollen) if you want it to collate properly without
dictionary look-ups. (Without CGJ, 'Llangollen' would collate before
'Llanberis', as 'ng' comes between 'g' and 'h'.) I believe that the
distinction between <U+17D2 KHMER SIGN COENG, U+178A KHMER LETTER DA> and
<U+17D2 KHMER SIGN COENG, U+178F KHMER LETTER TA> is likewise phonetic
(rather than etymological), but I can no longer find the definition of the
difference between these two graphically identical sequences. The crucial
point in at least the Welsh and Slovak cases is that the difference affects
collation order.
While on this subject, is there a recommended way of distinguishing in the
encoding the Khmer letter ba pronounced /b/ and the Khmer letter ba
pronounced /p/ (as in many Indic loans) when they precede vowels? In Khmer
the latter sorts equal to <U+1794 KHMER LETTER BA, U+17C9 KHMER SIGN
MUUSIKATOAN> at the primary level. There has been a discussion on Khmer
collation, but I couldn't find a resolution of this issue.
Richard.
This archive was generated by hypermail 2.1.5 : Thu Mar 22 2007 - 20:29:16 CST