From: Mark Davis (mark.davis@icu-project.org)
Date: Fri Apr 11 2008 - 10:17:47 CDT
You can file this as a request of the UTC using the online form on the
Unicode site.
Mark
On Fri, Apr 11, 2008 at 2:38 AM, Henrik Theiling <ht@theiling.de> wrote:
> Hi!
>
> TR#34 states that all character and sequence names (except one pair
> involving HANGUL JUNGSEONG O-E) will always be unique when space,
> medial dash and the words LETTER, CHARACTER, and DIGIT are ignored.
>
> When writing a character name recognition algorithm, I would like to
> let the user be as concise as possible, yet without violating Unicode
> rules, and without being in potential conflict with upcoming versions
> of Unicode. As I understand it, the rule that LETTER, CHARACTER,
> DIGIT, spaces, medial dash can be ignored in comparision try to
> address this very idea.
>
> I noticed that for some scripts, e.g. Khmer, character names are still
> a mouthful. I also noticed that when I additionally ignored
> CONSONANT, VOWEL, and INDEPENDENT, the Unicode names are still unique
> and it would improve writing (at least) Khmer character names a lot.
>
> I was wondering whether it would be feasible to tighten the condition
> in TR#34 so that no upcoming Unicode versions had ambiguous names if
> CONSONANT, VOWEL, and INDEPENDENT were ignored, too.
>
> Of course, there may be more ignorable words, so the question is where
> to stop. 'VOWEL' is in 360 words, which is more than 'CHARACTER',
> which is in only 106. But CONSONANT and INDEPENDENT are relatively
> seldom. Here are a few other words that occur very frequently that
> can currently be ignored without ambiguity:
>
> VOWEL in 360 names
> CONSONANT in 66 names
> INDEPENDENT in 19 names (seldom, but also a mouthful)
> SYLLABICS in 630 names
> LIGATURE in 508 names
> FORM in 798 names
> PATTERN in 297 names
>
> For stability reasons, it would be very nice if we knew that upcoming
> Unicode versions had the same nice unambiguity, because then I could
> officially ignore those words so my users could enjoy more concise
> character names.
>
> Bye,
> Henrik
>
>
-- Mark
This archive was generated by hypermail 2.1.5 : Fri Apr 11 2008 - 10:29:34 CDT