Re: Latin w/ diacritics (was Re: benefits of unicode)

From: Peter_Constable@sil.org
Date: Wed Apr 18 2001 - 09:40:49 EDT


>> I've done it numerous times, and I still do it on occasion. I still call
it
>> a "hack", though, since that's what it is, in many cases at least: The
cmap
>> in TrueType fonts for Windows uses Unicode. People think they're putting
>> their favourite character on an 8-bit codepoint, but in the font they
are
>> actually hacking with Unicode, breaking conformance rules C6 and
especially
>> C7.
>>
>>
>
>The 'cmap' in TrueType fonts for Windows uses double-byte encoding.

I'm sorry, I oversimplified. We're both partially right: Since Win 3.1,
Windows TrueType cmaps have supported one of a handful of encodings:

- fonts could be "UGL" (i.e. Unicode) encoded using a format 4 cmap (now
they can also use a combination of format 4 plus format 12)

- for CJK, various double-byte encodings were used (encoding IDs 2 to 6)

- fonts could be "symbol" encoded, but in this case they are required to
use a format 4 cmap (i.e. using Unicode) and should have glyphs mapped in
the PUA beginning at 0xF000 (see
http://www.microsoft.com/typography/otspec/recom.htm)

>"Hack" has a negative connotation, even if
>it's used to refer to one's own activity. Perhaps we could be more
>respectful to people who are only doing the best they can with the
>tools they have available (and may not even know rules exist).

Well, there's no question they (we) are doing the best we can. We need to
be honest with ourselves, though: If we're using software and related
standards that someone else developed with certain conventions and
protocols assumed and we redefine those conventions and protocols to use
*our* characters in *their* software, we're hacking. (I'm remineded of
Fessick in The Princess Bride who described starting wars as a noble
profession.) In DOS days when we used to load bitmap fonts into our
Hercules video cards and run Word in video mode (or was it character mode?
I forget), we used to see the window frame around our doc drawn using our
custom glyphs rather than box-drawing characters. Word had expectations
about the system on which it was working, which wasn't unreasonable - the
notion that you can made assumptions about what the host system is
providing for you is foundational in all software development. We were
hacking the system, invalidating those expectations. Back then, there
wasn't anything critical that relied on those expectations, and so we could
get away with doing those kinds of things (provided you didn't mind the
frame being messed up). Even in Win 95 days we were still getting away with
it almost all the time. But not any longer. I have colleagues who have
reverted to Word 6 on Win95 so they could do the things they used to. In
newer software, our custom-encoded font practices are having their true
identity revealed. They're hacks.

Only in the case where you create a custom encoding and use it only in
software that either knows about that custom encoding or else makes no
assumptions about encoding is it not a hack.

>Consider the following linked page:
>
>http://www.linguistics.unimelb.edu.au/research/hmong/hmongaustpahawh.html#pahawh

>
>If you want to view the page properly, you'll download their font.
>There is no alternative for web master or visitor.
>(I know about PDFs and GIFs; they aren't HTML.)

Indeed there's no alternative, and so I don't knock them in the slightest.
But there's also no question that their TrueType font is a hack of Unicode,
as the attached GIF makes clear: e.g. U+0031 DIGIT ONE is mapped to glyph
ID 20, which is clearly not a digit one in that font.

>The custom TrueType fonts for Windows are simply legitimate
>descendants of older custom fonts in other formats.

If they had different platform and encoding IDs, I'd agree. But they don't.
They are marked as Windows Unicode fonts. It's either a hack of Unicode, of
MS's encoding IDs, or of the TrueType spec.

>> And if the designers of the pyramids hadn't told the people in the
>> quarries, "The blocks shall have exactly these dimensions..."?
>
>What would have happened? (I'm sorry if the punch line is obvious to
>everyone else...)

If the quarriers hadn't conformed to the standards established by the
architects, the pyramids would never have been built.

- Peter

---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <peter_constable@sil.org>

(See attached file: Naadaa_hackedfont.gif)


Naadaa_hackedfont.gif



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:16 EDT