From: John H. Jenkins (jenkins@apple.com)
Date: Fri Oct 26 2007 - 11:04:03 CDT
On Oct 26, 2007, at 8:14 AM, Mark E. Shoulson wrote:
> Yeah, and an "x" in English has a different meaning (sound) than an
> "x" in Spanish (letters "mean" sounds; Chinese graphs mean words.
> More or less). Yet we still encode them the same because they look
> the same. Unicode generally tries to code what's written more than
> what's meant, I thought.
>
Well, not really.
Unicode tries to formalize the informal understanding that users of a
script bring to it. In the case of x, "everybody" knows that it's the
same letter in English as in Spanish. In East Asia, there are a
number of cases where "everybody" knows that two entities are separate
characters even if they look almost the same and in fact may be
indistinguishable in practice.
It gets complicated, of course, because my "everybody" may disagree
with your "everybody," and technical limitations impose themselves,
and so on and so on. But this is largely why Han has the "non-
cognate" rule--in practice, East Asian lexicographers have been using
it for centuries when preparing dictionaries and other authoritative
character lists.
=====
John H. Jenkins
jenkins@apple.com
This archive was generated by hypermail 2.1.5 : Fri Oct 26 2007 - 11:05:18 CDT