From: Adam Twardoch (list.adam@twardoch.com)
Date: Thu May 22 2008 - 13:30:01 CDT
Even though in principle, it's a nice thought that text encoding should
deal with semantics, it never has, and never will be, entirely.
The whole concept of writing is that it blurs the distinction between
form and function. The history of writing knows countless examples of
people just adapting existing visual signs (glyphs) to a fully different
meaning — it has been done with handwriting, and with typesetting. It
will never change.
The text encoding and the visual rendering must be "good enough" but
those standards vary. For some, there is no real difference between the
Latin A and the Cyrillic А (or was it A or maybe Α?). For others, there
is a functional difference between ‒ and –, so they have encoded them
separately, and yet, people key in 3–1 even if they mean 3‒1. (Erm, can
you see the difference here?).
As one friend once put it, "In Unicode, a character is an entity that is
defined by enumeration". John likes to quote Borges and his famous
"animal classification"*. In fact, the classification of writing marks
as "characters" vs. "glyphs" is just a splendid example of that very
problem.
After all, we're all still too lazy to key in the proper quotation marks :)
A.
* Jorge Luis Borges "El idioma analítico de John Wilkins", 1942
http://en.wikipedia.org/wiki/Celestial_Emporium_of_Benevolent_Recognition
John Hudson wrote:
> David Starner wrote:
>
>> On Wed, May 21, 2008 at 10:43 PM, John Hudson <john@tiro.ca> wrote:
>>> The key word here should be *glyph*. Correct cultural norms for spacing
>>> punctuation should not be a text encoding issue at all, any more than
>>> spacing any other glyphs should be an encoding issue. These should be
>>> display issues, handled via font intelligence and language tagging.
>
>> Taken most literally, that's obviously not a common practice at all; I
>> note the spaces after your commas and periods, and the examples I've
>> seen without them have struck me as erroneous and deficient.
>
> A word space is a semantic separator. The space that French
typographers traditionally place before some punctuation marks is not.
It doesn't change the meaning of the text whether that space is present
or not.
>
> JH
>
>
-- Adam Twardoch | Language Typography Unicode Fonts OpenType | twardoch.com | silesian.com | fontlab.net
This archive was generated by hypermail 2.1.5 : Thu May 22 2008 - 13:31:56 CDT