Re: Unicode surrogates in browsers for the compelling demo

From: James Kass (jameskass@worldnet.att.net)
Date: Sat Nov 17 2001 - 22:48:47 EST


Michael Kaplan wrote,

> When I did have this working, I had the config as shown at the following
> site; further respondent sayeth naught:
>
> http://www.i18nwithvb.com/surrogate_ime/code_charts/
>
> I was at that time running Win2000 SP2, IE 5.5, and a version of WEFT.
>

For what it's worth, this page (all on one line):
http://www.i18nwithvb.com/surrogate_ime/code_charts/05.asp?nofont

...has kind of a bizarre result in the MSIE 5.5 on Windows
Millennium Edition.

It doesn't display the Plane Two glyph *unless* it is being
"selected" (as in for copy/paste operation). While it is being
selected, the display for that single character flickers between
the actual character and the dual null box characters.

Depending upon when the mouse is released, the resulting
display will either be the Plane Two character or two null boxes
highlighted for selection. Can't make a screen shot of this
because as soon as the screen capturing software is fired up,
the highlighting disappears, and the display is back to two null
boxes. This seems to work for only one character at a time.

But, the amazing thing is that a non-BMP character displays in
the browser on Win M.E. at all, even if briefly.

(I fixed up the Win M.E. registry with the Scripts 42 setting and
entered appropriate font names as string values just like the
instructions for W2K.)

This only happens with Plane Two, not Plane One. When tested with
registry set to Code2001 on Etruscan, it looked like the browser was
trying to use a fixed width font, just like it looked under W2K. Could
it be that the browser only tries to use a fixed width font for non-BMP
material? (The Plane Two font *is* fixed width, Code2001 isn't.)

In MSIE 5.5 on Win M.E., the null boxes aren't from Code2001, even
with the registry set to Code2001 for scripts 42, and the Latin font
set to Code2001 in the browser, and even a font-face tag used in the
HTML simultaneously.

Based on a letter from Lars Marius Garshol in which the Opera 6.0
beta is mentioned as supporting non-BMP ranges, downloaded the
free version for Windows M.E., but haven't been able to display
any Plane One or Plane Two characters yet. Do note, however, that
the Opera browser offers sophisticated display and font controls,
and possibly I just haven't figured out the right combination. Or,
it could be that only Opera for W2K-and-up supports non-BMP
ranges.

The charts made for Plane Two (links above) are encoded as UTF-8
shortest form, right? In other words, we shouldn't be trying NCRs
for surrogate pairs or anything equally special?

Best regards,

James Kass.



This archive was generated by hypermail 2.1.2 : Sat Nov 17 2001 - 23:00:10 EST