Re: American English translation of character names

From: Edward H. Trager (ehtrager@umich.edu)
Date: Thu Dec 18 2003 - 16:03:40 EST

  • Next message: Philippe Verdy: "RE: American English translation of character names"

    On Thursday 2003.12.18 04:05:53 -0800, Peter Kirk wrote:
    > On 18/12/2003 02:51, Arcane Jill wrote:
    >
    > >...
    > >In fact, until Kenneth Whistler's email about American English - I
    > >actually thought the Unicode character names /were/ in American
    > >English, because they are certainly not in my native dialect (although
    > >I did know that most Americans don't say "full stop"). Rest assured,
    > >Kenneth, we in Britain do /not/ refer to slash as "solidus",
    > >underscore as "low line", backslash as "reverse solidus", paragraph
    > >sign as "pilcrow sign", and so on. I have no idea where these terms
    > >came from, but, take it from someone who lives here, they are not in
    > >common usage in Britain. (With the exceptions of "full stop" and
    > >"anticlockwise"). Curious -- I wonder where those "official" names
    > >came from?
    >
    >
    > They are not the names used by British programmers. But they are perhaps
    > the names which were used by British typesetters, and maybe American
    > ones too, in the old days of hot metal.
    >
    > >
    > >I've never attached any importance to the "proper" names (and I'm also
    > >a programmer). In fact, I don't even see why a Unicode character /has/
    > >to have a "proper name" at all. ASCII characters never had them. And,
    > >hey - the official names for CJK Unified Ideographs Extension A (for

    Hopefully most of you will agree that having official names for Unicode
    characters in ASCII-only English is very useful when various characters get
    discussed on mailing lists such as this one. It saves having to look up hex values
    endlessly, since many still don't have (or, as in my case, don't always
    have access to) Unicode-enabled email clients.

    I personally think that it is an *interesting* omission that the CJK ideographs
    do not have meaningful names.

    I'm probably going to be just opening up a can of worms by suggesting a meaningful
    CJK ideograph naming system (and I fully expect lots of comments back from the
    experts to the tune of "Yes, the CJK group considered all manner of things like
    this before, but it wouldn't work because of X, Y, and Z..." or "You really don't
    know what you are talking about"). But assuming that risk, I'm going to say it
    anyway and give some reasons for why I would do it this way: A useful system for
    naming CJK ideographs would be to construct names by stringing together:

       (1) An indicator if the character is simplified (SIMPLIFIED) or traditional (TRADITIONAL)
       for ideographs originating in China which come in both traditional and simplified forms,
       or an indicator for a variant form (VARIANT) if an encoded variant of
       another more commonly-used glyph. Omit indicator if the character of Chinese origin
       only comes in one form. If the character was "invented" by the Japanese, use "JAPANESE" as
       the indicator. If invented by the Koreans, use "KOREAN" as the indicator. If invented
       by the Vietnamese, use "VIETNAMESE" as the indicator.

       (2) If the character is used in Chinese, then the primary pronounciation of
       the ideograph in modern standard Mandarin Chinese using pinyin followed by a
       digit 1-4 to indicate the tone under the primary pronounciation. If the character
       does not appear in Chinese but rather was invented by the Japanese, Korean, or historical
       Vietnamese, then provide the primary pronounciation in Japanese if used in
       Japan, Korean if used only in Korea, Vietnamese if use historically only in Vietnam.
       (3) The primary meaning of the character in english according to the primary language
       in which that character appears.

    For example:

       爱 u7231 SIMPLIFIED AI4 LOVE
       愛 u611B TRADITIONAL AI4 LOVE
       戈 u6208 GE1 SPEAR
       為 u70BA TRADITIONAL WEI2 TO BE
       爲 u7232 VARIANT WEI2 TO BE
       圓 u5713 TRADITIONAL YUAN2 CIRCLE
       円 u5186 JAPANESE EN YEN

    Standardized names such as these, at least for the BMP CJK characters,
    would make it pretty clear to most knowledgeable readers what characters were being
    discussed even when unable to see the glyphs for whatever reasons.
    Perhaps more importantly, if this were in the unihan database, which is
    the database that most developers are going to access first, it would be trivial to
    query out various useful subsets of ideographs, such as the TRADITIONAL vs. SIMPLIFIED
    (vs. the "Doesn't change" subset), or those that are uniquely JAPANESE, etc. I'm not
    saying it would be the complete solution for everything -- of course not. But it would
    put this information "at ones fingertips", so to speak, in a prominent database that
    many people look at.

    > >example) tell me nothing more than the script and codepoint anyway. I
    > >tend to regard them as "comments".
    > >
    > Agreed. The names are useful for selecting a character from a drop-down
    > list. But they are only useful if they are accurate. I agree with Doug
    > that "As a programmer, I can't personally imagine designing a program
    > that relies on the Unicode names to identify characters uniquely". I
    > suspect that the issue is more that WG2 people who are not programmers
    > decided on behalf of programmers, but without asking them, that
    > stability of names would be a good thing. And maybe because they want to
    > make sure their work lasts 1000 years.
    >
    > Well, I don't want to be offensive to WG2 again, so I invite WG2 members
    > to correct me on this and explain why stability of character names is
    > considered so important. Don't just say "we promised stability so we
    > must deliver", I want to know why the promise was made and to whom. If
    > the people to whom the promise was made don't actually want it, then
    > maybe WG2 can be released from its unwise commitment.
    >
    > --
    > Peter Kirk
    > peter@qaya.org (personal)
    > peterkirk@qaya.org (work)
    > http://www.qaya.org/
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Thu Dec 18 2003 - 16:20:32 EST