From: Adam Twardoch (list.adam@twardoch.com)
Date: Thu May 22 2008 - 13:30:01 CDT
Even though in principle, it's a nice thought that text encoding should 
deal with semantics, it never has, and never will be, entirely.
The whole concept of writing is that it blurs the distinction between 
form and function. The history of writing knows countless examples of 
people just adapting existing visual signs (glyphs) to a fully different 
meaning — it has been done with handwriting, and with typesetting. It 
will never change.
The text encoding and the visual rendering must be "good enough" but 
those standards vary. For some, there is no real difference between the 
Latin A and the Cyrillic А (or was it A or maybe Α?). For others, there 
is a functional difference between ‒ and –, so they have encoded them 
separately, and yet, people key in 3–1 even if they mean 3‒1. (Erm, can 
you see the difference here?).
As one friend once put it, "In Unicode, a character is an entity that is 
defined by enumeration". John likes to quote Borges and his famous 
"animal classification"*. In fact, the classification of writing marks 
as "characters" vs. "glyphs" is just a splendid example of that very 
problem.
After all, we're all still too lazy to key in the proper quotation marks :)
A.
* Jorge Luis Borges "El idioma analítico de John Wilkins", 1942
http://en.wikipedia.org/wiki/Celestial_Emporium_of_Benevolent_Recognition
John Hudson wrote:
 > David Starner wrote:
 >
 >> On Wed, May 21, 2008 at 10:43 PM, John Hudson <john@tiro.ca> wrote:
 >>> The key word here should be *glyph*. Correct cultural norms for spacing
 >>> punctuation should not be a text encoding issue at all, any more than
 >>> spacing any other glyphs should be an encoding issue. These should be
 >>> display issues, handled via font intelligence and language tagging.
 >
 >> Taken most literally, that's obviously not a common practice at all; I
 >> note the spaces after your commas and periods, and the examples I've
 >> seen without them have struck me as erroneous and deficient.
 >
 > A word space is a semantic separator. The space that French 
typographers traditionally place before some punctuation marks is not. 
It doesn't change the meaning of the text whether that space is present 
or not.
 >
 > JH
 >
 >
-- Adam Twardoch | Language Typography Unicode Fonts OpenType | twardoch.com | silesian.com | fontlab.net
This archive was generated by hypermail 2.1.5 : Thu May 22 2008 - 13:31:56 CDT