Re: &#61623 ?

From: Juliusz Chroboczek (jec@dcs.ed.ac.uk)
Date: Sat Apr 01 2000 - 12:47:45 EST


"Tony Harminc" <tzha1@ibm.net>:

TH> I have recently received a couple of emails from unrelated people
TH> (one at yahoo.com and the other at hotmail.com) containing the string
TH> "&#61623;" apparently as a list item bullet. This is hex F0B7, which
TH> is in the private use area.

TH> Does anyone know what character this is trying to be, and what evil
TH> software is generating such a thing?

Michael Everson:

ME> Tsk. Software making use of the Private Use Area is not evil per
ME> se; the evil creeps in where the sender and receiver have not
ME> agreed what the character is intended to represent.

The actual behaviour is somewhat more interesting. Lend me your ears.

Microsoft TrueType fonts may either contain glyphs indexed by Unicode
codepoints (``Microsoft Unicode encoding''), or glyphs indexed by
``symbol font'' glyph index (``Microsoft symbol encoding'').
Microsoft Symbol fonts contain 224 glyphs, starting, depending on the
font, at index 0x20 or 0xF020. It is not known how Windows
distinguishes between the two cases, but consulting usFirstGlyphIndex
in the OS/2 table works fine in all the fonts we have checked. (This
was explained to me by Richard Griffith, to whom I am very grateful.)

When using symbol fonts in some Windows software, the document
contains the glyph indices. When converting to HTML, to RTF, or to
Unicode plain text, the glyph indices are treated as Unicode
codepoints. They will therefore appear as either private zone
codepoints or Latin-1 codepoints depending on the internal
organisation of the font used.

Sincerely,

                                        Juliusz Chroboczek



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:00 EDT