From: Andrew West (andrewcwest@gmail.com)
Date: Fri Mar 19 2010 - 15:40:35 CST
On 19 March 2010 17:41, Asmus Freytag <asmusf@ix.netcom.com> wrote:
>
>> 24.2 Name formation
>> An entity names shall consist only of the following characters
>> • LATIN CAPITAL LETTER A through LATIN CAPITAL LETTER Z,
>> • DIGIT ZERO through DIGIT NINE,
>> • SPACE,
>> • HYPHEN-MINUS, and
>> • FULL STOP if the entity being named is a collection
>> The first character in an entity name shall be a Latin capital letter.
>
> The actual rules for *character* names have always been that the
> first character in a *word* (i.e. following a space, or start of name)
> must be a Latin capital letter, except that hyphen-minus may start any
> word but the first.
ISO/IEC 10646: 2003 Annex L Rule 1 states that "Names of characters
may also include digits 0 to 9 (provided that a digit is not the first
character in a word)" which agrees with what you say.
> I've not been aware that this was changed deliberately, so to me, the
> above statement of the rules seem to contain an editing mistake resulting
> from their recent reformulation.
I think you are right. There seems to be an omission here, with the
result that these rules do not concur with R2 in the TUS text.
R2 Digits do not occur as the first character of a character name, nor
immediately following a space character.
Luckily there is still time time to fix that in the FCD ballot.
>> The last character in an entity name shall be either a Latin capital
>> letter or a Digit.
>
> This seems to needlessly rule out a hypothetical
>
> TIBETAN LETTER A-
>
> While this may not occur as a part of Tibetan characters
> as far as they have been encompassed, it looks like an
> unnecessary restriction in the face of future naming
> requirements for this and other scripts.
A character name such as TIBETAN LETTER A- is theoretically possible,
and prohibiting final hyphen-minus may be unnecessarily restrictive if
someone wants to propose a character naming convention for some as yet
unencoded script that uses hyphen-minus to represent an orthographic
glottal stop (this is something that has been discussed recently for
one particular script, although in that case glottal stop only occurs
initially). On the other hand, if this restriction has been around for
a long time, and implementations expect only A-Z/0-9 at the end of a
character name, then it is probably best to live with this
restriction. I personally don't think it is necessary to change this
rule on the off chance that final hyphen-minus may be useful at some
future date.
Andrew
This archive was generated by hypermail 2.1.5 : Fri Mar 19 2010 - 15:44:58 CST