From: Peter Jacobi (peter_jacobi@gmx.net)
Date: Sat Dec 06 2003 - 13:39:29 EST
Dear All,
I am attempting transcoding Tamil text (in legacy 8-bit encodings, which
are in visual glyph order, being heirs of the Tamil typewriter) into Unicode
(which uses 'logical' order invented by ISCII):
http://www.jodelpeter.de/i18n/tamil/xref-uc.htm
When I thought, my converter was ready, I had a severe collision
with reality, as I tried it on some webpages.
The problem: in the legacy encoding you can style individual characters,
which not only breaks my simple converter, but which may have no
good equivalent in Unicode anyway. See this example:
(all legacy encoded Tamil is shown using C-style escape, Unicode Tamil as
NCR)
Converting unstyled text
from TSCII
lA \xC4\xA1
le \xA7\xC4
lo \xA7\xC4\xA1
to Unicode
lA லா
le லெ
lo லொ
Now the consonant l should get a distinct color:
In TSCII:
lA <span style='color:#00f'>\xC4</span>\xA1
le \xA7<span style='color:#00f'>\xC4</span>
lo \xA7<span style='color:#00f'>\xC4</span>\xA1
In Unicode:
lA <span style='color:#00f'>ல</span>ா
le <span style='color:#00f'>ல</span>ெ
lo <span style='color:#00f'>ல</span>ொ
It is easy to see, that simple n:m mapping cannot make this conversion.
It is not that easy to judge whether this is the desired conversion at all.
And what should the receiving software should do with it.
Some tests: In Mozilla 1.4.1 the characters fall apart and in IE5.5 the
style expands to the entire orthographic syllable.
Unicode test page: http://www.jodelpeter.de/i18n/tamil/markup-uc.htm
TSCII test page: http://www.jodelpeter.de/i18n/tamil/markup-tscii.htm
After seeing this effect at its source, it's now clear why you can't style
individual
Tamil characters in a word processor, when using Unicode (whereas
you can do so, in legacy encodings).
It's hard to promote Unicode, when things that have worked in the past,
stop working.
Any insights?
Regards,
Peter Jacobi
-- +++ GMX - die erste Adresse für Mail, Message, More +++ Neu: Preissenkung für MMS und FreeMMS! http://www.gmx.net
This archive was generated by hypermail 2.1.5 : Sat Dec 06 2003 - 14:28:39 EST