Re: Braille, CJK and unicode

From: John H. Jenkins (jenkins@apple.com)
Date: Mon Feb 02 2009 - 17:41:11 CST

  • Next message: Samuel Thibault: "Re: Braille, CJK and unicode"

    On Feb 2, 2009, at 3:10 PM, Samuel Thibault wrote:

    >>
    >> The best way for someone fluent in Chinese to understand what a
    >> character means is to leave it in its context.
    >
    > As said above, the problem is that sighted users have _more_ context:
    > they know precisely which kanjis are used. Blind users only have the
    > pronunciation.
    >

    I expressed myself badly.

    Chinese writing rarely uses single characters in isolation. Even in
    something such as a computer UI, which consists of text fragments with
    only an implicit context, the visible text consists almost entirely of
    *words*, which are generally two characters long.

    So if I switch to the Chinese UI on my Mac and launch TextEdit, I get
    a couple of dozen UI elements with text attached, and in none of these
    is there a character by itself. The majority are two-character words,
    and some longer. If I encounter any of these UI elements in non-
    Chinese transcription (e.g., daa2hoi1 or lit6yan3 or cyun4syun2), I
    can readily tell what the two characters involved are. If I'm dealing
    with an extended text, that will generally supply by itself enough
    context for the user to tell which specific character is meant.

    Spoken Chinese evolved in the direction of two-syllable units largely
    because of the numerous homophones found with single-character units,
    even when tones are taken into account. On those rare occasions when
    someone does need to clarify which particular ideograph is meant,
    they'll say "X as in XY" or write it on their palm. (Lower-case palm,
    not upper-case one.)

    For example: My Chinese surname is extremely rare (as a surname) in
    Chinese, but common as an element of Japanese names and common as a
    character in its own right. When I introduce myself in Cantonese as
    surnamed zeng2 ("well"), most people assume I'm a stupid foreign devil
    who can't even pronounce his own name with the right tone and say
    something like, "Ah, you're surnamed zeng6." (zeng6 is a relatively
    common surname.) To clarify what character is actually meant, I'll
    have to say, "zeng2seoi2 ge3 zeng2" ("'Well' as in 'well of water'").

    (I don't have the same problem with Mandarin speakers as no common
    surname is a homophone for zíng. Mandarin speakers just tend to
    wonder why I'm trying to pass a Japanese surname off as Chinese. And
    I actually am inclined to say, "'Well' as in 'frog-at-the-bottom-of',"
    because that usually gets a laugh.)

    So in practice, if you are dealing with any kind of transcription of
    Chinese into non-ideographic form, you'll be dealing with multiple-
    character chunks, which will almost always supply enough context for
    the user to tell what characters are meant. And if for some reason
    you *do* need to clarify what ideograph a single character is, the
    simplest way would be to simply use a common two-character word or
    phrase containing it. For that, you don't need a set of glosses, but
    a word list. CEDICT may be the fastest way to get one of those.

    =====
    John H. Jenkins
    jenkins@apple.com



    This archive was generated by hypermail 2.1.5 : Mon Feb 02 2009 - 17:44:08 CST