From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Mar 26 2004 - 15:10:28 EST
> > (D) None of the above
>
> True.
I would like to add to Jim Allan's excellent explanation
here that the relevant coding domain for these decisions
of same or different for encoding a particular character
is the *script* in question.
The first decision that needs to be taken is whether a
particular writing system mooted for consideration for
encoding of its characters constitutes a distinct script
or not in the sense used by Unicode/10646.
If the answer is no, and the writing system is considered,
for example, to be a stylistic or historic variant of a
script already encoded, then considerations of unification
*within* a script come into effect. If it turns out that
the particular writing system in question contains characters
beyond whatever have already been encoded for that script,
then those characters become valid candidates for additional
encoding. Recent examples can be found among the various
Arabic character additions for West African languages written
in the Arabic script.
If the answer is yes, then an entirely separate script will
be encoded. This script must then have *all* of its
characters encoded, even where there might be
considerable overlap in appearance and/or linguistic function
for some subset of those characters, for either historic
reasons or merely by coincidence. An example of this can
be see in Old Italic, many of whose letterforms are clearly
related to early Greek and to early Latin. Nevertheless, once
Old Italic was distinguished as a script to be encoded, rather
than just another variant alphabet (or set of alphabets, actually)
of archaic Greek, then that determines the further decisions
about the repertoire to be encoded. It doesn't make any
sense to just pick out those *particular* Old Italic letters
that happen to be distinguishable in shape (U+10307 OLD ITALIC
LETTER HE) or in function (U+1030E OLD ITALIC LETTER ESH)
from Greek letters and to encode only them.
Where people seem to get most hung up the first time they
encounter UTC decisions about encoding characters (particularly
for scripts, as opposed to symbol sets) is on these lookalike
and/or historical relation questions. Hence the eternal newbie
questions about Latin, Greek, and Cyrillic capital "A", for
example.
--Ken
This archive was generated by hypermail 2.1.5 : Fri Mar 26 2004 - 15:50:42 EST