Re: Unicode Search Engines

From: Stefan Probst (stefan.probst@opticom.v-nam.net)
Date: Mon Feb 18 2002 - 23:35:03 EST


At 30 Jan 2002 11:38:37 -0500, John Cowan <jcowan@reutershealth.com> wrote:
-------------------------
>Stefan Probst wrote:
> > And since we are already in Vietnamese.... (to round the things up):
> > I am not sure, how e.g. in the introduction to dictionaries or
> > Vietnamese language books, the tonal mark can be printed "alone". One
> > solution might be to combine them with a "space", but at present, this
> > does not work always.
>
>When does it not? It is the standard Unicode thing to do.

Well, I tried it with:
a) the Vietnamese "tonal marks":
- grave U+0300 combining class: 230
- hook above U+0309 combining class: 230
- tilde U+0303 combining class: 230
- acute U+0301 combining class: 230
- dot below U+0323 combining class: 220

b) the Vietnamese "modifier" characters:
- breve U+0306 combining class: 230
- circumflex U+0302 combining class: 230
- horn U+031B combining class: 216

I tried to combine them with the space character and with some vowels.

The tonal marks went usually quite fine, but the modifier characters did not:
In WinME, they did not work in MSWindows97, OpenOffice641.
In IE5.5 they did not work with the space, and only with the "right
combination" of vowels and modifiers:
OK: (all vowels a,e,i,o,u) + (any of breve or circumflex)
OK: o + horn, u + horn (which are in fact valid Vietnamese characters)
NOT OK: a + horn, e + horn, i + horn (which actually are not valid
Vietnamese characters)

Are the described issues a problem of the OS (e.g. rendering engine),
application (why does IE behave different from Word?), or correct Unicode
implementation (e.g. that the horn does not combine with a,e,i)?

Best Regards,
Stefan



This archive was generated by hypermail 2.1.2 : Mon Feb 18 2002 - 23:07:17 EST