RE: &#61623 ?

From: Murray Sargent (murrays@microsoft.com)
Date: Sun Apr 02 2000 - 18:52:06 EDT


It's true that SYMBOL_CHARSET fonts are represented in TrueType fonts by
codes from 0xF020 through 0xF0FF. Microsoft Word uses these codepoints
internally as well, although RichEdit (also used for a email editor in
Outlook) doesn't. The 0xF0B7 is a bullet (similar to the U+2022) in the
Symbol font, which has been distributed since Windows 1. My guess is that
WordMail leaked the 0xF0B7 code out, but it would be good to have a
reproducible scenario. It should most definitely be fixed...

Thanks
Murrayh

-----Original Message-----
From: Juliusz Chroboczek [mailto:jec@dcs.ed.ac.uk]
Sent: Saturday, April 01, 2000 9:43 AM
To: Unicode List
Subject: Re: &#61623 ?

"Tony Harminc" <tzha1@ibm.net>:

TH> I have recently received a couple of emails from unrelated people
TH> (one at yahoo.com and the other at hotmail.com) containing the string
TH> "&#61623;" apparently as a list item bullet. This is hex F0B7, which
TH> is in the private use area.

TH> Does anyone know what character this is trying to be, and what evil
TH> software is generating such a thing?

Michael Everson:

ME> Tsk. Software making use of the Private Use Area is not evil per
ME> se; the evil creeps in where the sender and receiver have not
ME> agreed what the character is intended to represent.

The actual behaviour is somewhat more interesting. Lend me your ears.

Microsoft TrueType fonts may either contain glyphs indexed by Unicode
codepoints (``Microsoft Unicode encoding''), or glyphs indexed by
``symbol font'' glyph index (``Microsoft symbol encoding'').
Microsoft Symbol fonts contain 224 glyphs, starting, depending on the
font, at index 0x20 or 0xF020. It is not known how Windows
distinguishes between the two cases, but consulting usFirstGlyphIndex
in the OS/2 table works fine in all the fonts we have checked. (This
was explained to me by Richard Griffith, to whom I am very grateful.)

When using symbol fonts in some Windows software, the document
contains the glyph indices. When converting to HTML, to RTF, or to
Unicode plain text, the glyph indices are treated as Unicode
codepoints. They will therefore appear as either private zone
codepoints or Latin-1 codepoints depending on the internal
organisation of the font used.

Sincerely,

                                        Juliusz Chroboczek



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:00 EDT