Re: VOWEL, CONSONANT, ...: allow recognition of shorter names?

From: Mark Davis (mark.davis@icu-project.org)
Date: Fri Apr 11 2008 - 10:17:47 CDT

  • Next message: David Starner: "Re: Using combining diacritical marks and non-zero joiners in a name"

    You can file this as a request of the UTC using the online form on the
    Unicode site.

    Mark

    On Fri, Apr 11, 2008 at 2:38 AM, Henrik Theiling <ht@theiling.de> wrote:

    > Hi!
    >
    > TR#34 states that all character and sequence names (except one pair
    > involving HANGUL JUNGSEONG O-E) will always be unique when space,
    > medial dash and the words LETTER, CHARACTER, and DIGIT are ignored.
    >
    > When writing a character name recognition algorithm, I would like to
    > let the user be as concise as possible, yet without violating Unicode
    > rules, and without being in potential conflict with upcoming versions
    > of Unicode. As I understand it, the rule that LETTER, CHARACTER,
    > DIGIT, spaces, medial dash can be ignored in comparision try to
    > address this very idea.
    >
    > I noticed that for some scripts, e.g. Khmer, character names are still
    > a mouthful. I also noticed that when I additionally ignored
    > CONSONANT, VOWEL, and INDEPENDENT, the Unicode names are still unique
    > and it would improve writing (at least) Khmer character names a lot.
    >
    > I was wondering whether it would be feasible to tighten the condition
    > in TR#34 so that no upcoming Unicode versions had ambiguous names if
    > CONSONANT, VOWEL, and INDEPENDENT were ignored, too.
    >
    > Of course, there may be more ignorable words, so the question is where
    > to stop. 'VOWEL' is in 360 words, which is more than 'CHARACTER',
    > which is in only 106. But CONSONANT and INDEPENDENT are relatively
    > seldom. Here are a few other words that occur very frequently that
    > can currently be ignored without ambiguity:
    >
    > VOWEL in 360 names
    > CONSONANT in 66 names
    > INDEPENDENT in 19 names (seldom, but also a mouthful)
    > SYLLABICS in 630 names
    > LIGATURE in 508 names
    > FORM in 798 names
    > PATTERN in 297 names
    >
    > For stability reasons, it would be very nice if we knew that upcoming
    > Unicode versions had the same nice unambiguity, because then I could
    > officially ignore those words so my users could enjoy more concise
    > character names.
    >
    > Bye,
    > Henrik
    >
    >

    -- 
    Mark
    


    This archive was generated by hypermail 2.1.5 : Fri Apr 11 2008 - 10:29:34 CDT