RE: Language Tagging And Unicode

From: Janko Stamenovic (janko@teletrader.com)
Date: Wed Jan 19 2000 - 09:54:50 EST


> -----Original Message-----
> From: Marco.Cimarosti@icl.com [mailto:Marco.Cimarosti@icl.com]
> Sent: Wednesday, January 19, 2000 2:34 PM
> To: Unicode List
> Subject: RE: Language Tagging And Unicode
>
> If I am honest, I must say that all the objections that I made to your
> proposal are also true for A, C, E, I, J, O, P, S, and X: in Latin *and*
> Cyrillic they could well be the same: there wouldn't even be any
> problem in
> upper-/lower-casing.

Well it seems that you didn't get this one. It looks to me that you just
observed shape and not semantics of the letters (to detemrine which are
semantically the *same* character).

Please look at the (no popups -- it's a picture of 5K!):

http://jankojs.tripod.com/lat-cyr.gif

Somebody who is relevant for Greek would be able to make the same picture
for Latin and Greek relation.

The picture shows which characters were able be just one, but "with
different glyphs in Cyrillic and Latin". And historically most of them were
actually implemented like this on the terminals which accepted 7 bit
characters!

The trick is that both Latin AND Cyrillic are based on Greek aphabet, where
Cyrillic has a lot of letters which look just like in Greek where Latin
changed them.

So if I'd be more precise, there would be a lot of characters that would
condense to one: A in Latin, A in Cyrillic, A in Greek can be once character
code, if we'd use language information all around. Or Latin S, Cyrillic C
and Greek Sigma could have been the same character etc.

More precisely: 30 latin characters of Serbian would be the same characters
of Cyrillic (only one character in the table). A lot of Greek characters
(not shapes!) would have intersection with Latin and Cyrillic. Etc.

But that's not how Unicode was designed.

"Use other font" was the answer which was possible even without Unicode.

> My personal answer is that Unicode's architecture is not a "clean" thing,
> and often one discovers that there is no "exact logic" behind
> many choices.

I agree with you completely. If the "language tagging" and "advanced
rendering mechanisms" were something that was planned from the start I think
that at the current moment Unicode table would look more than different.

So I know that adding characters does not fit with this advanced (rendering)
logic, I just see that it's how a lot of issues were actually solved up to
now.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:58 EDT