>>Are there separate CH, Ch and ch keys on Slovak keyboards?
>Of course not. Keyboards were designed in America. Besides,
keyboards are glyph-oriented, not character oriented. I am not
aware of any operating system that can display two glyphs for a
single character (not yet, anyway). Are we here to accept the
status quo, or to internationalize computing?
>>Many languages, including English, make use of digraphs and
trigraphs to >represent sounds which are represented in other
orthographies by single >characters.
>Oh, yeah, it's all about English. The rest of us are idiots.
English does not consider digraphs separate characters, and
English is right. The rest of us should just assimilate.
Resistance is futile.
>Well, fine. Then let's declare Unicode the English way of
transcribing languages, and not call it an international
standard of character encoding.
Hey, Adam, you're not giving the rest of us much credit for
being concerned about I18N and the needs of non-English users.
We really are concerned. It's just that what you're asking for
*won't make any difference* to Slovak users other than in their
(your) perception.
If you don't like English examples to prove a point, use
Spanish. "Ch" is considered a separate character in Spanish,
and Spanish users can do *all* they want using the presently
available encoding.
By the way, keyboards are *not* glyph oriented. Ask any speaker
of Chinese.
>>In some non-Slavic language adaptations of the Cyrillic
script, up to four >letters may be combined to represent a
single sound, and these >'quadragraphs' are often listed as
single letters of the alphabet and have >specific sorting and
hyphenation rules. Are you suggesting that each of >these
sequences _needs_ to be encoded as a precomposed character?
>I am not talking about transliteration. I am talking about
native use. If some language natively considers a quadragraph a
character in its own right, then yes, we need to encode it. Or
we need to stop referring to Unicode as CHARACTER ENCODING.
Either solution is acceptable.
Nobody's talking about transliteration here. In Lanna script, I
know of a sequence of 5 symbols (discontiguous, by the way)
that make up a single entity. When we get to discussing
proposals for Lanna, I will *not* be recommending that this be
encoded as a separate entity because it simply isn't necessary,
no matter how native users perceive it.
>>>The fact that it can be constructed from two glyphs, C and
H, is >>irrelevant, many other characters can be so constructed
(e.g. N with caron >>can constructed from an N and a caron, yet
it is a separate character).
>
>>There are plenty of people on this list who would argue that
it should not be.
>But the fact is, it is. And as long as Unicode is to be
thought of as character encoding, it should be.
Wrong definition of character. (See Socratic dialogue.)
>>What have you actually gained?
>Consistency. There is a DZ, for example.
Sorry, but consistency simply is not acceptable justification
is a standard that has been forced to make compromises for
legacy standards while still wanting to maintain some ideals
wherever possible.
>>Remember that Unicode is a standard for encoding _plain
text_.
>No, it is a standard for encoding _characters_. It states so
quite explicitly.
Again, you're working with the wrong definition of character.
>Yes, it is possible to encode the CH as the C followed by the
H, and the N caron by the N followed by some connection code
followed by a caron. And it is perfectly possible for software
to handle it. But that would not be CHARACTER encoding. Unicode
clearly states its goal to be the encoding of characters of all
languages, existing and defunct. CH is a character is in
Slovak.
Yes, it is character encoding, just not the definition of
character you're assuming.
Peter
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:54 EDT