VOWEL, CONSONANT, ...: allow recognition of shorter names?

From: Henrik Theiling (ht@theiling.de)
Date: Fri Apr 11 2008 - 04:38:30 CDT

Next message: Otto Stolz: "Re: VOWEL, CONSONANT, ...: allow recognition of shorter names?"

Previous message: Marion Gunn: "Re: "French+" support by Unicode"
Next in thread: Otto Stolz: "Re: VOWEL, CONSONANT, ...: allow recognition of shorter names?"
Reply: Otto Stolz: "Re: VOWEL, CONSONANT, ...: allow recognition of shorter names?"
Reply: Mark Davis: "Re: VOWEL, CONSONANT, ...: allow recognition of shorter names?"
Maybe reply: Kenneth Whistler: "Re: VOWEL, CONSONANT, ...: allow recognition of shorter names?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Hi!

TR#34 states that all character and sequence names (except one pair
involving HANGUL JUNGSEONG O-E) will always be unique when space,
medial dash and the words LETTER, CHARACTER, and DIGIT are ignored.

When writing a character name recognition algorithm, I would like to
let the user be as concise as possible, yet without violating Unicode
rules, and without being in potential conflict with upcoming versions
of Unicode. As I understand it, the rule that LETTER, CHARACTER,
DIGIT, spaces, medial dash can be ignored in comparision try to
address this very idea.

I noticed that for some scripts, e.g. Khmer, character names are still
a mouthful. I also noticed that when I additionally ignored
CONSONANT, VOWEL, and INDEPENDENT, the Unicode names are still unique
and it would improve writing (at least) Khmer character names a lot.

I was wondering whether it would be feasible to tighten the condition
in TR#34 so that no upcoming Unicode versions had ambiguous names if
CONSONANT, VOWEL, and INDEPENDENT were ignored, too.

Of course, there may be more ignorable words, so the question is where
to stop. 'VOWEL' is in 360 words, which is more than 'CHARACTER',
which is in only 106. But CONSONANT and INDEPENDENT are relatively
seldom. Here are a few other words that occur very frequently that
can currently be ignored without ambiguity:

    VOWEL in 360 names
    CONSONANT in 66 names
    INDEPENDENT in 19 names (seldom, but also a mouthful)
    SYLLABICS in 630 names
    LIGATURE in 508 names
    FORM in 798 names
    PATTERN in 297 names

For stability reasons, it would be very nice if we knew that upcoming
Unicode versions had the same nice unambiguity, because then I could
officially ignore those words so my users could enjoy more concise
character names.

Bye,
Henrik

Next message: Otto Stolz: "Re: VOWEL, CONSONANT, ...: allow recognition of shorter names?"
Previous message: Marion Gunn: "Re: "French+" support by Unicode"
Next in thread: Otto Stolz: "Re: VOWEL, CONSONANT, ...: allow recognition of shorter names?"
Reply: Otto Stolz: "Re: VOWEL, CONSONANT, ...: allow recognition of shorter names?"
Reply: Mark Davis: "Re: VOWEL, CONSONANT, ...: allow recognition of shorter names?"
Maybe reply: Kenneth Whistler: "Re: VOWEL, CONSONANT, ...: allow recognition of shorter names?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Apr 11 2008 - 04:41:42 CDT