Latin digraph characters (was: Re: Klingon silliness)

From: DougEwell2@cs.com
Date: Tue Feb 27 2001 - 11:53:58 EST


In a message dated 2001-02-27 04:17:48 Pacific Standard Time,
adam@whizkidtech.net writes:

> No character set standard was ever designed by Slovaks. However, Slovak
> linguists have always treated "ch" as a separate character. As they
> do "dz" and "dz" with caron, but those are encoded in Unicode.

Adam mentions the Latin digraphs encoded for DZ at U+01F1/2/3 and for DZ with
caron at U+01C4/5/6. These characters, along with LJ at U+01C7/8/9 and NJ at
U+01CA/B/C, were ostensibly added so that Cyrillic (Serbian) text converted
to the Latin (Croatian) script could be converted 1-to-1. (DZ and DZ-caron
are also used in Slovak, as Adam points out.)

This has always puzzled me, because Cyrillic includes lots of other
characters that transliterate to two or more Latin letters. CH, SH, SHCH,
and ZH leap to mind; there may be more. What was the thought process behind
providing these compatibility characters only for the Serbo-Croatian
additions to Cyrillic, but not for the other Cyrillic characters?

Of course, I am not at all suggesting that any such additional characters be
added. The existing compatibility characters require three code points each
(uppercase, titlecase, and lowercase) and I was under the impression that
they were deprecated, though I could find no mention of that in TUS 3.0.

-Doug Ewell
 Fullerton, California

P.S. I don't agree that the amount of traffic on this list is a problem.
There are several interesting, on-topic threads going on here, and people are
feeling compelled to participate. Relatively few posts are straight "me
too's" or of the especially annoying form "I have browser X, database Y,
operating system Z, how can my app display my Unicode characters?"



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:19 EDT