From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Sat Nov 15 2008 - 16:55:36 CST
Karl Pentzlin wrote:
> I am just writing a mail to someone in Russia who suggests to encode a
> "barred o with macron" which is used in the Orok language.
I think it is best to explain realistically that characters with diacritic 
marks will not be added to Unicode as separately encoded, i.e. as code 
points, as a matter of policy. You can say this in different formulations 
and tones, of course. There’s no point in getting into long arguments.
> Trying to explain to him that the encoding of such letters is not
> needed, as sequences like U+04E9 U+0304 are appropriate, I have
> created a little Internet page to prove this:
> http://www.pentzlin.com/Orok.html
Well the page seems to prove just the opposite, as you say, so it’s not 
useful for your purposes. The point is that even though U+04E9 U+0304 doesn’t 
work universally, or even widely, it’s the only way
> I am horrified to see the result, using a computer with the newest
> version of Microsoft Vista and Internet Explorer (see attached
> Orok.png). Firefox does not perform better.
What happens is that your browser has Times New Roman as the default font, 
which contains (in your system, as well as mine) U+04E9 but not U+0304. 
Hence the latter is taken from some other font, such as Arial Unicode MS. It 
is no surprise that a diacritic from one font does not play well with a base 
letter from another font. And if your browser had e.g. Calibri as the 
default font, you might see just a macron with no base character, as I did 
when I first looked at your page.
When creating web pages with more or less special characters, you just need 
to consider font issues. If you want to present U+04E9 U+0304, then you 
should suggest, in your CSS style sheet, fonts that contains both. 
Unavoidably at present, some users won’t have any of those fonts installed. 
The world isn’t perfect quite yet. (In fact, I’m afraid Arial Unicode MS 
would be about the only font that is nowhere near common and has both of 
them.
Even with Arial Unicode MS for both characters, the visual appearance is 
barely tolerable (the macron isn’t horizontally centered on the center of 
the base character) for U+04E9 U+0304 and completely unacceptable for U+04E8 
U+0304 even on Microsoft Word 2007, since the macron crosses the base 
character so that the diacritic cannot be seen – it looks like some dirt on 
the base letter.
> Thus, sequences like U+04E9 U+0304 are NOT appropriate to fulfil the
> user's needs, as long as leading operating systems behave like this
> more than 10 years after Unicode has decided no longer to accept
> precomposed characters.
I don’t see how this has anything to do with operating systems.
It’s a matter of fonts and a matter of application programs and the 
libraries they use.
Too bad if you really need those characters. But encoding new letters with 
diacritics as code points wouldn’t help. Even if it were possible to add 
them into Unicode, it would take many many years before they have been added 
there and implemented widely in fonts that are available on people’s 
computers. It is much more realistic to hope for (and maybe to fight for) 
better implementation of the existing Unicode characters in fonts and 
rendering systems.
-- Yucca, http://www.cs.tut.fi/~jkorpela/
This archive was generated by hypermail 2.1.5 : Sat Nov 15 2008 - 16:59:00 CST