Re: Unicode and the digital divide. (derives from Re: Towards some more Private Use Area code points for ligatures.)

From: John H. Jenkins (
Date: Fri May 31 2002 - 11:42:45 EDT

On Friday, May 31, 2002, at 03:45 AM, William Overington wrote:

> This discussion seems to be related to the digital divide.
> Suppose that someone has access to a PC which has Windows 95 or Windows 98
> and has Microsoft Word 97 installed. The person wishes to produce a print
> out of a transcription of a piece of text from an 18th Century English
> book,
> by keying in a copy of the text and then printing it out. The person
> finds
> that the text has a ct ligature in it. How does the person produce the
> desired finished result?

It depends.

You're assuming that the person wants to *exactly* (or fairly closely)
reproduce the contents of the book. That, however, is typically not the
case. IIRC, the original Declaration of Independence uses a medial long-s,
  but when I transcribe it now, I'll write "Congress" and not "Congreſs".
I have a 19th century book on my shelves which uses the ct ligature
throughout. If I were to quote a passage from it, however, I wouldn't
bother to reproduce them.

If I *were* to want to reproduce something like that so accurately that I'
d capture all of its ligatures, I'd also try to match the typeface. In
Latin typography, as a general rule, ligatures are considered a matter of
stylistic preference. Typesetting a document with, say, Courier and lots
of ligatures doesn't make aesthetic sense.

> My suggested solution is to try to obtain a TrueType fount which contains
> a
> ct ligature, add the fount into the PC, open a Word 97 document, key the
> text, wherever a ct ligature is needed, use Insert | Symbol and choose the
> ct ligature from the dialogue box.

This is not a good solution. It's a kludge. You have to do it on older
systems, true; we all know that. The problem is that it's still a kludge.
   The document works visually, but other operations such as search and
replace or spell checking cease to function.

The fi and fl ligatures are part of MacRoman (for some reason I do not
fathom) and are therefore present in all Mac fonts. I once produced a
document that looked better with them than it would have without, so (in
Word) I did a global replace to get them in there. Note that I didn't get
any of the *other* f-ligatures, just fi and fl. Instantly, Word realized
that half of the words in my document were misspelled. It was a pain.

> By there existing a publicly available document which includes within it
> a
> pairing of ct with the code point U+E707 the possibility exists that some
> people might include ct in a TrueType fount and might place it at U+E707
> within that TrueType fount.

No, not really. The existence of the entire Deseret Alphabet at its
ConScript code points for years on some of my Web pages (in fact, I haven'
t updated them yet) hasn't made Adobe or Monotype or anybody else start
adding it to their fonts.

> However, if that page were saved in HTML format, the code numbers would be
> passed through to the HTML page, so if the list of code points were widely
> used by people, then maybe the document containing the ct ligature could
> be
> displayed in a web page, interested viewers needing to use a fount with a
> ct
> ligature available.

No. Wrong, wrong, wrong. HTML is ***NOT*** designed for accurate
typography. You can make ***NO*** guarantees or assumptions as to the
typeface that someone will actually see when they view your Web site.
This is, in fact, an argument in the direction opposite the one you want
to make. You may have designed your Web page to be viewed with Arial, but
I may happen to hate Arial and use something else in a larger point size
in concession to my aging eyes. Making your Web page illegible unless I
have a specialized font (or a set of specific uses for the PUA) is, under
the circumstances, a very bad idea.

People whose Web sites require specialized fonts in order to be viewed
usually make it possible for people to download the specialized fonts at
the same time.

> Also, the very existence of the list might lead to
> someone who is authoring a fount choosing to add a ct ligature and various
> other ligatures into the fount.

Again, no. Adobe, of course, is decoupling ligatures from code points in
its own fonts, but it's at the head of the OpenType bandwagon. Other
foundries will likely follow as time progresses.

> The existence of the list also hopefully makes it more likely that in the
> future \uE707 drawn to the screen of a Java applet will produce a ct
> ligature rather than a rectangular box.

Again, no. Again, ConScript is your counter-example.

Let's face it, it's hard enough to get people to adopt new features of
Unicode when they become standardized. Weird little uses of the PUA are
simply ignored.

> Now, if this list of ligatures were not used, how exactly, precisely would
> the person produce the print out of a transcription of a piece of text
> from
> an 18th Century English book, by keying in a copy of the text and then
> printing it out?

PDF. That's what it's for. If you *must* have a precise typographic
appearance for your document, you use PDF.

> I am reminded of the posting some time ago in this
> discussion list where someone, I cannot remember who was the author,
> commented on what that author called a great tsu nami (tidal wave) of the
> computing industry, whereby from the moment that a new version of a
> product
> is launched, businesses presume that customers are using the very latest
> version.

The digital divide is a different issue. People have been arguing for
years that a certain character must be present in Unicode because systems
won't provide support for it any other way. This is a valid point, but at
the same time, it would be improper to shackle the standard to that.

Let's go back a decade, shall we? Not very long ago, major programs (like
Word) made the assumption that all text was displayed using one byte per
character. That meant that you couldn't have versions of the program that
worked for Japanese or Chinese without a lot of rewriting. Did that mean
that you had to force Japanese and Chinese to be encoded using one byte
per character? No, it meant that people who wanted to do Chinese and
Japanese had to live with not being able to do it with all their software,
  and it meant that they had to live with buying specialized programs
instead of the generic version.

You want to put a ct ligature in your document? The answer is the same.
Even if you put it in the PUA, not all software now is Unicode-savvy, and
not all Unicode-savvy programs handle PUA overrides or let you input
arbitrary code points. Latin doesn't *require* the ct ligature. If you
absolutely *must* have it, then you're simply going to have to limit
yourself to programs like InDesign that let you have it in a way that
works with the existing standard.

For everybody—whether it be Big Faceless Corporation, Ltd. or my
mother-in-law—there is a trade-off. New machines, new OSs, new programs
all give you benefits but cost money and time to learn. I know a lot of
typographers who refuse to move off of the decade-old version 3 of
Fontographer because they like it better than anything that's come since.
My wife won't upgrade to Mac OS X because she's easily confused when her
desktop changes. (*sigh*)

At the same time, it's ludicrous to insist that those of us who innovate
must make all our innovations absolutely backward compatible to the dawn
of time.

> So, I wonder what is the answer to the question of how the person wishing
> to
> produce the print out of a transcription of a piece of text from an 18th
> Century English book, by keying in a copy of the text and then printing it
> out should proceed? Is it a case of "first buy a new PC" or what?

Let's alter the question. I'm a sinologist who wants to reproduce
precisely my twelfth-century copy of the Confucian Analects on my Mac Plus.
   You know what? I'm hosed.

I'm an Egyptologist who wants to reproduce precisely my copy of the Book
of the Dead in the email I'm writing my grandmother. You know what? I'm
still hosed.

There are limits built into technology at every stage. They're not
pleasant, but they're there. Ideally, the newer is always an improvement
in some way. People have to decide whether the improvement is sufficient
to justify their spending money on it. That's life.

> The comment has been made "You are trying to find a solution to a problem
> that has already been solved in a better way, and in the process you will
> create more problems for anyone who uses your solution.".
> What, exactly and precisely, is the better way that is claimed?

In Latin typography, ligature formation is usually handled by markup. If
a particularly ligature is absolutely required in your document at a
particular point—not because it looks nice, or because it more accurately
reproduces what was originally written—you can do it by inserting
zero-width joiner.

Typically, however, you turn ligatures on the same way you select point
size or typeface. In InDesign and other applications, I select a region
of text and turn on rare ligatures in a menu.

Again, think of the standard f-ligatures. I'm seeing them through this
email. I type "fish," and I see an fi ligature. *You* type "fish" and I
see an fi ligature. It's because my system is smart enough to know that
the fi ligature is standard in Latin typography, that it's present in my
font, and that my font is one where "fi" looks better written with the
ligature than it would without it.

If I wanted the ct ligature, I'd have to find a font in my system that had
it, switch to that font and turn the ligature on. It may not come through
on your end—but then, the precise font or point size I'm using wouldn't
come through, either.

> What, exactly and precisely, are the problems which I will allegedly
> create
> for other people?


1) Non-display operations won't work anymore.

2) Users would be required to manually insert the ct ligature everywhere.

> These are not rhetorical questions, I really would genuinely like to know.
> I am quite happy to accept that perhaps a solution to the problem has been
> found, yet wonder whether that solution is, as of today, only available to
> people who are on one side of a digital divide.

Again, there are many digital divides. There are things that work on
Windows and not on Macs. There are things that work on Macs and not on
Windows. There are things that work with InDesign that don't work with
Word. There are things that work with Word and not with InDesign. There
are things that work with Windows XP that don't work with Windows 98, and
things that work with Windows 98 that don't work with DOS 3.0. There are
things that work with Mac OS X that don't work with Mac OS 9, and things
that work with Mac OS 9 that don't work with Mac OS 6.8.3 of venerable

Don't put yourself in the position of arguing that it's wrong to innovate.
   Innovation in the IT industry always creates a digital divide.

> If there really are
> problems which my list will cause then I will be happy to add a note
> stating
> of the problem. Yet I am very concerned that I may be in effect being
> told
> here that Unicode is only really intended for people with the very latest
> equipment using expensive solutions that are only realistically available
> to
> rich corporations.

*sigh* Unicode has from the beginning been designed with the assumption
that it would require rendering engines capable of complex typesetting.
We've always known that. It's taken longer to get them to market than we
would have thought and liked, but they're showing up now. It's a bar
we've always had to cross, however, if not for Latin ligatures, then for
Arabic and Devanagari, and so on.

The advantage of this is (ideally) that once you get a system capable of
doing Arabic or Devanagari or Latin ligatures, you get a system capable of
doing all of them. That, at least, was the goal.

> My thinking is that the existence of the list, (and hopefully, the list
> having been distributed in this discussion group, many people will be
> aware
> of its existence, and may perhaps have even filed a copy for possible
> future
> reference), will hopefully make the availability of such ligatures in
> founts
> more widespread and will also hopefully influence people who make software
> packages, such as relatively inexpensive electronic book publishing
> packages, to build in a feature so that such ligatures may be accessed
> from
> a TrueType fount.

1) People who make fonts already know about ligatures.

2) The set of ligatures appropriate for Latin typography is very much
font-specific. Zapfino has dozens of Latin ligatures because it's a
calligraphic font. Courier should have none because it's a monospaced

3) People who write book publishing packages already build in features to
access Latin ligatures. Microsoft Word is not a good program to use for
publishing books.

> I feel that the Unicode system should be available for all, not just for
> people who are on the money side of the digital divide.

It's a nice goal. It isn't a realistic one, however.

Now, a question on my part. You're using the term "digital divide," but
you're not defining it very well. Could you tell me:

a) What the "digital divide" really is from your perspective—that is, what
OS is on one side and what OS on the other?

b) What are the relative numbers of people with systems on both sides?

If, say, your divide were to be between Mac OS 6 or earlier and Mac OS 7
or later (the point at which Apple adopted TrueType as its primary font
technology), then there are likely 99.99% of all Mac users on the
7-or-later side of the divide. Do you see what I'm asking here?

John H. Jenkins

This archive was generated by hypermail 2.1.2 : Fri May 31 2002 - 10:23:00 EDT