RE: Language Tagging And Unicode

From: Janko Stamenovic (janko@teletrader.com)
Date: Thu Jan 20 2000 - 13:21:53 EST


> -----Original Message-----
> From: Peter Constable [mailto:peter_constable@sil.org]
> Sent: Thursday, January 20, 2000 4:53 PM
>
> I think it's plain to all of us that there is a categorical
> distinction between saying "English A and French A are the
> same" and "Russian A and French A are the same".

But I really don't see any difference from programmers (or even somebodys
who typeset text) perspective -- Cyrillic and Latin "shapes" (glyphs) are
the main visual difference but the principle of writing, printing etc. is
completely the same in Cyrillic and Latin (compared to Arabic, Hebrew or
anything). If anybody knows any other difference than what can be now called
"the shape of glyphs" he is welcome to add to this discussion!

> Similarly, I think we
> all agree to adopt the convention of saying that Russian and
> Serbian are written with a single script; there is no practical
> reason to say the are different scripts. The differences in the
> writing systems are simply not that great.

I'd more precisley say: Serbian is written in TWO scripts: Latin AND
Cyrillic. Interchangeably! That's one small point that people forget. And
it's not the only one language with that property, as far as I know.

But out of all letters known as Cyrillic, only some are used both in Russian
and Serbian. There exist a lot of Russian letters which are not used in
Serbian and vice versa. Handwritten Serbian Cyrillic and cursive *printed*
Serbian Cyrillic is *more different* from Russian. Does this mean that
Russian and Serbian P are the same letter in such case -- depends on the
definition:

Back when I was six years old I was not being tought to write "glyphs" but
"letters". Exactly then I've learnt how to *write* "Serbian t" and a few
weeks later "Russian t". They looked completely different. The fact that
they often look the same *in print* was always just a coincidence for me.

Especially because few weeks later I've learnt *written* Latin m. The fact
that Russians write "t" like "m" I was not able to see as anything else than
coincidence. All kids in school mixed all this shapes then in their
schoolworks quite a lot before they became used to all this stuff. If the
Cyrillic and Latin scripts are really so different as you try to say, such
confusions would not exist.

On the other hand, we always used to call letter the same if they mean the
same, but we had to name them "Cyrillic" (for us the default was "Serbian"),
"Latin" and "Russian" (only here the default was "Cyrillic").

So in our words: Russian t looks like Latin m, but not like our t and
Cyrillic T in both alphabets and Latin T are completely the same.

And of course! We didn't learn "scripts" but "alphabets"! And looking at
alphabets, Serbian t and Russian t are different letters! Because alphabet
is "how you print AND how you write letters". And at the end -- cursive is
"how you print to look like written". So this information is obviously
needed.

The problem of calling Unicode "plain text" standard and refusing to care
about Serbian Cyrillic based on that is that some languages represented by
Unicode are completely printed AS THEY ARE WRITTEN (which means that
printing IS considerably HARDER than printing Latin and Cyrillic). Because
of that, Serbian Cyrillic seems to be in the danger to be the only European
language which would have to be rendered only with these much more complex
engines. That's why I considered adding a few characters in Unicode -- only
to match current practice. Everybody claims that is as such only because of
the compatibility -- that's exactly why I analyzed idea of new characters --
to make Serbian Cyrillic enough compatible with all these
programs/systems/engines which know how to print Latin and Cyrillic but do
not know about diffrence in Chinese scripts.

Even Unicode as standard does not expect that every Unicode-compatible
program MUST be able to represent all scripts. Exactly because of that maybe
Cyrillic letters (even if they are Serbian!) should be considered as
something which SHOULD work on systems incapable to display different
Chinese characters.

Can anybody comment if this has some sense?



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:58 EDT