From: Jungshik Shin (jshin@mailaps.org)
Date: Sun Dec 07 2003 - 08:16:38 EST
On Sat, 6 Dec 2003, Doug Ewell wrote:
> Peter Jacobi <peter underscore jacobi at gmx dot net> wrote:
>
> > Some tests: In Mozilla 1.4.1 the characters fall apart and in IE5.5
> > the style expands to the entire orthographic syllable.
> > Unicode test page: http://www.jodelpeter.de/i18n/tamil/markup-uc.htm
> > TSCII test page: http://www.jodelpeter.de/i18n/tamil/markup-tscii.htm
>
> BTW, your "Unicode test page" is marked:
>
> <meta http-equiv="Content-Type"
> content="text/html; charset=ISO-8859-1">
Peter uses NCRs so that it doesn't matter (although I prefer to
tag the page as 'UTF-8', even in that case), does it? Anyway, he
should have used 'lang' tag to help browsers pick up fonts. In two
pages above, simply adding 'lang="ta"' to <table ....> would suffice.
In xref-uc.htm, if you want a fine-grained control, he can just globally
replace '<span class="glyph">&#....</span>' with '<span lang="ta"
class="glyph">&#....</span>'.
> while your TSCII test page is marked "x-user-defined". I'm not sure
> what either of those declarations accomplishes.
TSCII is not recongized by most browsers(it's not registered with
IANA)[1]. 'x-user-defined' means that to view the page one has
to configure one's browser to use Tamil 'custom encoded' [2] font
(in TSCII/TAM? encoding) font when rendering 'x-user-defined' page.
Most browsers have an option to set fonts for 'x-user-defined'. It's
certainly better than tagging it as 'iso-8859-1' or 'windows-1252'.
> > After seeing this effect at its source, it's now clear why you can't
> > style individual Tamil characters in a word processor, when using
> > Unicode (whereas you can do so, in legacy encodings).
>
> This is browser behavior, not word processor behavior, and certainly not
> an inherent defect in the Unicode logical-order model. Display engines
> need to do a better job of applying style to individual reordrant
> glyphs, that's all.
You're right. Anyway, this is an interesting challege to
layout/rendering engines. In case of Korean Hangul (as Philippe wrote),
it's even more so because unlike Indic scripts[3], it has multiple
canonically equivalent (and not-canonically-equivalent in Unicode sense
but nonetheless 'equivalent' in a certain sense) representations.
Jungshik
[1] http://bugzilla.mozilla.org/show_bug.cgi?id=186463
[2] 'Custom' (or 'hack') encoded : Windows-1252, Symbol or MacRoman Cmap
is used to store Tamil glyphs (or other glyphs for other Indic scripts).
Needless to say, we want to leave these fonts behind and move on.
[3] As is well known, there are a few letters for which there are two
canonically equivalent representations in Indic scripts.
This archive was generated by hypermail 2.1.5 : Sun Dec 07 2003 - 09:10:40 EST