Doug Ewell wrote:
> Adam mentions the Latin digraphs encoded for DZ at U+01F1/2/3
> and for DZ with caron [...]
>
> This has always puzzled me, because Cyrillic includes lots of other
> characters that transliterate to two or more Latin letters.
> CH, SH, SHCH, and ZH leap to mind; there may be more.
> What was the thought process behind providing these compatibility
> characters only for the Serbo-Croatian
> additions to Cyrillic, but not for the other Cyrillic characters?
I think that those digraphs *only* make sense in the context of
Serbo-Croatian. And not very much sense even in that context.
So you should not consider English romanization vs. Russian Cyrillic, but
rather Serbo-Croatian in Latin orthography (a.k.a. "Croatian") vs.
Serbo-Croatian in Cyrillic orthography (a.k.a. "Serbian").
So the letter "щ" ("shch") is out of the game because it does not exist in
Serbo-Croatian.
In Serbo-Croatian most Latin and Cyrillic letters have an 1-to-1 mapping:
Latin: a,b,c,č,ć,d,e,f,g,h,i,j,k,l,m,n,o,p,r,s,š,t,u,v,z,ž
Cyrillic: а,б,ц,ч,ћ,д,е,ф,г,х,и,ј,к,л,м,н,о,п,р,с,ш,т,у,в,з,ж
The Cyrillic letters "ђ", "љ", "њ" and "џ" are an exception, because they
correspond to digraphs in the Latin script: "dj", "lj", "nj" and "dž"
(dz+caron).
So, in order to maintain the 1-to-1 mapping, character sets for former
Yugoslavia introduced the fictitious "characters" dj,lj,nj,dž, and these
were ultimately handed over to Unicode.
This invention originated the can of worms called "titlecase", because it is
not enough to merely declare that dj,lj,nj,dž are "characters" to change the
reality, and this becomes evident in case conversions.
When "lj" is the first letter of a capitalized word (e.g. a proper name like
"Ljubljana") then it has the form "Lj". But when it is in an all-capitals
word (e.g., again, "LJUBLJANA") then it has the form "LJ".
_ Marco
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:19 EDT