Re: Unicode Search Engines

From: Stefan Probst (stefan.probst@opticom.v-nam.net)
Date: Wed Feb 20 2002 - 09:46:31 EST


Hello Doug,

Actually, it seems like IE would do it like you describe: try to normalize
to NFC/NFKC and display that. MS Word does not. When looking in different
sizes, the glyphs look quite ugly, since they are really combined: The dot
below for example is only sometimes exactly below the vowel, often it is
too far left or right.

According to what you write, the renderer in my combination seems really
broken for the word processors (MS Word and OpenOffice), since it cannot
display the combining modifiers.
Regarding IE: The "a and i with horn" might not be used right now and
therefore acceptable. But that it is not able to display the "space with
modifiers" is less acceptable.
On the other side, there seems actually no need to display non-NFKC for the
Web, since as far as I understand, W3C is planning to make NFKC a
requirement for the Web. By trying to normalize the input (the combining
sequences to NFKC) IE might even work against planned W3C rules.

Assuming, that the renderer is part of the OS and used by most - if not all
- applications, I conclude, that Windows ME is not able to handle the
combining modifier characters. Anybody experiences with other OSs / other
characters?

Stefan

At 21:52 18.02.2002 -0800, Doug Ewell wrote:
-------------------------
>In theory, a fully conformant Unicode renderer is supposed to be able to
>combine an arbitrary base character with arbitrary combining marks. The
>renderer is supposed to look at the glyphs and decide how to combine them
>dynamically so they look reasonable together. So you should be able to
>combine "o with horn," "a with horn," or "q with horn" and get the
>expected result.
>
>In the real world, it doesn't work like that. Renderers detect sequences
>of base+combining characters, look for an equivalent precomposed form, and
>display that instead. For example, they detect U+006F (o) followed by
>U+031B (combining horn), and instead of trying to figure out how to
>combine them, simply generate U+01A1 (o with horn) instead. This results
>in a nice-looking precomposed glyph (if it's in the font) with a lot less
>work. But it means that U+0061 (a) plus U+031B (combining horn) can't be
>displayed properly, since there is no precomposed code point for "a with
>horn."
>
>In the '90s, when UTC and WG2 were more open to encoding precomposed
>forms, this approach was not too problematic, since any legitimate
>diacriticized character in an alphabetic script probably had its own
>precomposed form. Today, because of normalization considerations, we are
>probably not going to see any more precomposed characters that can already
>be formed with combining sequences. So if some language turns out to need
>"a with horn" in the future, its readers will have to cross its fingers
>that rendering engines become capable of displaying U+0061 U+031B
>properly.
>
>-Doug Ewell
> Fullerton, California



This archive was generated by hypermail 2.1.2 : Wed Feb 20 2002 - 09:51:11 EST