RE: VOWEL, CONSONANT, ...: allow recognition of shorter names?

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Apr 11 2008 - 21:16:41 CDT

Next message: Asmus Freytag: "Re: Collection numbers."

Previous message: Andrew Cunningham: "Re: Using combining diacritical marks and non-zero joiners in a name"
In reply to: Michael Everson: "Re: VOWEL, CONSONANT, ...: allow recognition of shorter names?"
Next in thread: Henrik Theiling: "Re: VOWEL, CONSONANT, ...: allow recognition of shorter names?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Michael Everson wrote:
> At 11:35 -0700 2008-04-11, Kenneth Whistler wrote:
> >National Bodies are (justifiably, I think) concerned and
> worried about
> >algorithmic constraints on their ability to name things,
> particularly
> >when the constraints get complicated to the point that they can't
> >remember all the details or envision being able to check
> manually for
> >uniqueness.
>
> Yep.

There's certainly applications that would benefit of having simplified
character names.

These names could be simplified by an automatic process that drops words
that are not necessary for uniqueness of character names in one version.

Now let's suppose a new character is added, and using the same list of
removed words, the characters names are no longer unique. How will the
automatic "word remover" will be able to make the difference? One solution
is to use the age of characters, i.e. their property specifying the Unicode
version in which they were introduced: words would still be removed (and
implied) from the older characters, when the newer character will still need
a precision. Other candidate words for suppression in simplified names
include "symbol", "with", "accent", "mark", "sign", "vocalic" (but beware of
"R", "RR", "L" and "LL" in Indic scripts which may need the difference
between the combining vocalic sign and an alternate base consonnant)...

Anyway, the default names assigned to characters should remain stable, even
if they are misleading (due to historic errors), so the extension of
stability rules will not be useful (they are already a constraint that may
become difficulties for assingning standard names to new characters).

On the opposite, the removal of all separators (spaces and hyphen) in the
existing stability rules seems overkill when a simple substitution of all
sequences of separators by a single space (substitutable using underscores
or capitalization for language identifiers) would have been enough to keep
the words separable.

But nothing prevents an application to use alternate lists of character
names (and notably localized or transliterated lists, or the original
character names in the script for which the character is defined). For
example the Hangul O-E exception in the (standard) default names comes from
a restriction to use only the Basic Latin letters, even though the "OE" name
should probably better be presented by using the Latin "open" O letter or a
diacritic for languages where it is meaningful. And nothing prevents the
same application to maintain its own stability rules on this simplified list
(here also the age/version property of characters will be useful, as this
property is guaranteed to be stable).

Next message: Asmus Freytag: "Re: Collection numbers."
Previous message: Andrew Cunningham: "Re: Using combining diacritical marks and non-zero joiners in a name"
In reply to: Michael Everson: "Re: VOWEL, CONSONANT, ...: allow recognition of shorter names?"
Next in thread: Henrik Theiling: "Re: VOWEL, CONSONANT, ...: allow recognition of shorter names?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sat Apr 12 2008 - 11:01:20 CDT