Adam:
What you describe here is entirely the same for English "th"
with the exception of sorting behaviour. This sequence
represents a linguistic entity (two, actually) which cannot be
analysed as a sequence of entities, whether reasonably
represented orthographically by "t" and "h" or otherwise. There
are other alphabets that have single symbols for the same
entity/sound, and you would not ever put a hyphen between the
two.
>The fact that it can be constructed from two glyphs, C and H,
is irrelevant, many other characters can be so constructed
(e.g. N with caron can constructed from an N and a caron, yet
it is a separate character).
Why this is irrelevant is because
- we're not constructing entities (which suggests that the
meaning represented by "ch" would be composed of the meanings
represented by "c" and by "h" - this kind of semantic
composition is not what anybody is suggesting in the case),
we're establishing encoded representations for them
- we're not talking about glyphs, but of characters
That fact that "ch" *can* be given an encoding representation
of C + H is entirely relevant. Since it can already be done
this way, and since it is possible to make any textual process
of interest work using this, then there's no need to add
something different.
>>But you are wrong. CH is not a _character_ in any language...
>Respectfully, I disagree...
We need to be careful here because (a) there are two senses of
"character" to get confused over here (sense 1: atomic unit of
textual information for encoding purposes, i.e. the Unicode
definition; sense 2: a unit within an orthography/writing
system); and (b) application of the second sense is subject to
attitudes and perceptions on the part of individuals within a
language community, and therefore not necessarily easy to
determine and not necessarily consistent across the community.
Your response to Michael was operating on the second sense.
Whether Michael was wrong or not on this point makes no
difference whatsoever, because we need to focus for this
standard on the first sense.
It is possible to use a sequence of two characters (sense 1) to
encode the (single) linguistic entity "ch" (for sake of
discussion, we'll say that it's one sense-2 character), and to
do so while making it appear to the user that their software
always perceives these (this) as a single character (sense-2,
which is what users are interested in). Since it is possible to
do this, then it is better to do this than to introduce a new,
single character (sense 1), which, in cases like these, end up
creating more problems than simplifications.
Peter
From: <adam@whizkidtech.net> AT Internet on 10/21/99 06:49 PM
CDT
Received on: 10/21/99
To: Peter Constable/IntlAdmin/WCT, unicode@unicode.org AT
Internet@Ccmail
cc:
Subject: Re: Mixed up priorities
At 13:06 21-10-1999 -0700, Michael Everson wrote:
>But you are wrong. CH is not a _character_ in any language. It
is a set of >strings of characters (C-H, C-h, c-h) used (sorted
etc.) as a _letter_ in >languages like Slovak, Czech, Welsh,
and traditional Spanish.
Respectfully, I disagree. I cannot speak for Welsh and Spanish,
but in Slovak and Czech, CH has all characteristics of a
character: It denotes a specific sound which cannot be
expressed in any other way. Nor can it be separated into two
sounds.
Many other alphabets have a separate character for this sound,
e.g. the chi in Greek, or the Cyrillic character that looks
like the Roman X.
The fact that it can be constructed from two glyphs, C and H,
is irrelevant, many other characters can be so constructed
(e.g. N with caron can constructed from an N and a caron, yet
it is a separate character).
It is not simply a string of characters because it cannot be
separated. You cannot, for example, divide a word at the end of
a line by following the C with a - and starting the next line
with an H. It is *not* C-H, C-h, and c-h. It is CH, Ch, and ch.
Also, ask any Slovak to tell you what the alphabet is, he will
inevitably list a H CH I within the sequence.
And, by the way, I am in no way trying to undermine your effort
to have the Klingon alphabet included in the Unicode. I just
wish we treated real languages the way their native speakers
treat them, not how Western experts perceive them.
Adam
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:54 EDT