Re: Unicode and the digital divide.

From: Doug Ewell (
Date: Fri May 31 2002 - 12:11:20 EDT

I think I'm starting to see part of the problem here.

William Overington <WOverington at ngo dot globalnet dot co dot uk>

> By there existing a publicly available document which includes within
it a
> pairing of ct with the code point U+E707 the possibility exists that
> people might include ct in a TrueType fount and might place it at
> within that TrueType fount.

You are assuming that a "cycle of usage" exists that will cause this
private-use allocation of a ct ligature at U+E707 to be

1. PUA code point U+E707 is assigned (by William Overington) to LATIN

2. A font designer, following William's suggestion, creates a font with
a ct-ligature glyph at U+E707.

3. One or more users discover (or already know about) the existence of
a ct-ligature in the font and use it in a document, which is then
distributed beyond the original user(s).

4. Other users discover the document and notice the "pairing of ct with
the code point U+E707."

5. Those users influence more font designers to add a ct-ligature at
U+E707 (or do it themselves, if they are font designers).

6. Go to step 3, and repeat until the use of U+E707 is entrenched.

Thus we establish grass-roots, popular-usage assignment of Unicode code

One small problem, though: what about the gap between steps 4 and 5? It
is presumed that users who encounter the initial document are actually
using the font that contains ct-ligature at U+E707 (hereafter "ct"). If
they are, they may not even notice the "new" ligature (ff and fi, at
least, are already well-known and expected in major fonts).

If they are not using that particular (e.g. because it is not
available), they will see the infamous "black square" and will assume
that either (a) the document is corrupted, (b) their software is not
performing the "correct" automatic ligation, or (c, the correct answer)
their font is missing a glyph. In any case, if they decide to do
anything about it, it will likely be to acquire the original font, not
to influence font designers to add the glyph to other fonts. Not all
users are font designers or "character geeks" like us.

In this scenario, the document becomes tied to the original font, rather
than the characteristics of the original fonts spreading to more fonts.
In essence, the creator of the document may just as well have created a
read-only PDF file with embedded fonts.

I am not aware of any character assignments, official or PUA, gaining
widespread usage through this approach. AFAIK, one of the reasons for
creating the ConScript Unicode Registry was to give font designers a
semi-standard place to put, say, Tengwar glyphs; but if that practice
has caught on in the case of specialized fonts used by Tolkein fans, it
certainly has *not* caught on in the mainstream. And bear in mind that
ConScript is the best-known of all PUA frameworks.

As John Hudson pointed out, ligation is not supposed to destroy other
aspects of text processing, such as spell-checking, searching, and
sorting. Imagine your search for the word "picture" coming up blank
because the document used the ct-ligature U+E707 instead of U+0063
U+0074. That won't exactly encourage other users to pick up the
precoded ligature. Making ligation work is not just up to the font; the
software must also know about the ligature so it can revert to the
"real" underlying characters and make searching and sorting work. Ask
any Macintosh user about this; their systems have handled ligation in
this way longer than Windows has.

For this reason, I really can't get on board with the position that the
character-glyph model, applied to the concept of ligation, is geared
only toward "people with the very latest equipment using expensive
solutions that are only realistically available to rich corporations."
People using Unicode to its fullest extent will already have to purchase
updated software to handle things like normalization, bidirectionality,
etc. properly. Practically all PC hardware sold in the last 3 years is
up to the task; it's the software that needs to be updated.

The user who wants a ct-ligature so he can faithfully transcribe an
18th-century document is most likely a student or researcher of some
sort. While students and researchers are hardly renowned for their
wealth, it seem unlikely that they would not have access to sufficient
hardware and software resources to perform this task.


-Doug Ewell
 Fullerton, California

This archive was generated by hypermail 2.1.2 : Fri May 31 2002 - 10:37:15 EDT