From: Edward H. Trager (ehtrager@umich.edu)
Date: Thu Dec 18 2003 - 16:03:40 EST
On Thursday 2003.12.18 04:05:53 -0800, Peter Kirk wrote:
> On 18/12/2003 02:51, Arcane Jill wrote:
>
> >...
> >In fact, until Kenneth Whistler's email about American English - I
> >actually thought the Unicode character names /were/ in American
> >English, because they are certainly not in my native dialect (although
> >I did know that most Americans don't say "full stop"). Rest assured,
> >Kenneth, we in Britain do /not/ refer to slash as "solidus",
> >underscore as "low line", backslash as "reverse solidus", paragraph
> >sign as "pilcrow sign", and so on. I have no idea where these terms
> >came from, but, take it from someone who lives here, they are not in
> >common usage in Britain. (With the exceptions of "full stop" and
> >"anticlockwise"). Curious -- I wonder where those "official" names
> >came from?
>
>
> They are not the names used by British programmers. But they are perhaps
> the names which were used by British typesetters, and maybe American
> ones too, in the old days of hot metal.
>
> >
> >I've never attached any importance to the "proper" names (and I'm also
> >a programmer). In fact, I don't even see why a Unicode character /has/
> >to have a "proper name" at all. ASCII characters never had them. And,
> >hey - the official names for CJK Unified Ideographs Extension A (for
Hopefully most of you will agree that having official names for Unicode
characters in ASCII-only English is very useful when various characters get
discussed on mailing lists such as this one. It saves having to look up hex values
endlessly, since many still don't have (or, as in my case, don't always
have access to) Unicode-enabled email clients.
I personally think that it is an *interesting* omission that the CJK ideographs
do not have meaningful names.
I'm probably going to be just opening up a can of worms by suggesting a meaningful
CJK ideograph naming system (and I fully expect lots of comments back from the
experts to the tune of "Yes, the CJK group considered all manner of things like
this before, but it wouldn't work because of X, Y, and Z..." or "You really don't
know what you are talking about"). But assuming that risk, I'm going to say it
anyway and give some reasons for why I would do it this way: A useful system for
naming CJK ideographs would be to construct names by stringing together:
(1) An indicator if the character is simplified (SIMPLIFIED) or traditional (TRADITIONAL)
for ideographs originating in China which come in both traditional and simplified forms,
or an indicator for a variant form (VARIANT) if an encoded variant of
another more commonly-used glyph. Omit indicator if the character of Chinese origin
only comes in one form. If the character was "invented" by the Japanese, use "JAPANESE" as
the indicator. If invented by the Koreans, use "KOREAN" as the indicator. If invented
by the Vietnamese, use "VIETNAMESE" as the indicator.
(2) If the character is used in Chinese, then the primary pronounciation of
the ideograph in modern standard Mandarin Chinese using pinyin followed by a
digit 1-4 to indicate the tone under the primary pronounciation. If the character
does not appear in Chinese but rather was invented by the Japanese, Korean, or historical
Vietnamese, then provide the primary pronounciation in Japanese if used in
Japan, Korean if used only in Korea, Vietnamese if use historically only in Vietnam.
(3) The primary meaning of the character in english according to the primary language
in which that character appears.
For example:
爱 u7231 SIMPLIFIED AI4 LOVE
愛 u611B TRADITIONAL AI4 LOVE
戈 u6208 GE1 SPEAR
為 u70BA TRADITIONAL WEI2 TO BE
爲 u7232 VARIANT WEI2 TO BE
圓 u5713 TRADITIONAL YUAN2 CIRCLE
円 u5186 JAPANESE EN YEN
Standardized names such as these, at least for the BMP CJK characters,
would make it pretty clear to most knowledgeable readers what characters were being
discussed even when unable to see the glyphs for whatever reasons.
Perhaps more importantly, if this were in the unihan database, which is
the database that most developers are going to access first, it would be trivial to
query out various useful subsets of ideographs, such as the TRADITIONAL vs. SIMPLIFIED
(vs. the "Doesn't change" subset), or those that are uniquely JAPANESE, etc. I'm not
saying it would be the complete solution for everything -- of course not. But it would
put this information "at ones fingertips", so to speak, in a prominent database that
many people look at.
> >example) tell me nothing more than the script and codepoint anyway. I
> >tend to regard them as "comments".
> >
> Agreed. The names are useful for selecting a character from a drop-down
> list. But they are only useful if they are accurate. I agree with Doug
> that "As a programmer, I can't personally imagine designing a program
> that relies on the Unicode names to identify characters uniquely". I
> suspect that the issue is more that WG2 people who are not programmers
> decided on behalf of programmers, but without asking them, that
> stability of names would be a good thing. And maybe because they want to
> make sure their work lasts 1000 years.
>
> Well, I don't want to be offensive to WG2 again, so I invite WG2 members
> to correct me on this and explain why stability of character names is
> considered so important. Don't just say "we promised stability so we
> must deliver", I want to know why the promise was made and to whom. If
> the people to whom the promise was made don't actually want it, then
> maybe WG2 can be released from its unwise commitment.
>
> --
> Peter Kirk
> peter@qaya.org (personal)
> peterkirk@qaya.org (work)
> http://www.qaya.org/
>
>
>
This archive was generated by hypermail 2.1.5 : Thu Dec 18 2003 - 16:20:32 EST