From: Ken Whistler (kenw@sybase.com)
Date: Mon Jan 31 2011 - 21:00:10 CST
On 1/31/2011 12:41 PM, Asmus Freytag wrote:
> I think that there's one good benefit to marking these characters as
> Lm - it would further cement the notion that these are not styled
> versions of the regular letters.
>
> Also, it would reduce the number of Ll characters that do not have a
> case partner.
>
> Given the precedent cited by Ben Scarborough for the superscript
> characters, this would further regularize the assignment of the GC.
All true, but...
>
> A counter argument could be if some of these characters are never used
> to "modify" another letter. If so, that fact and it's importance (and
> therefore the importance of making the distinction in the gc) really
> ought to be discussed in the block descriptions and/or annotated in
> the character nameslist, it seems.
That concern isn't relevant to these particular subscript characters,
which were all
encoded as modifier letters, but couldn't be *named* "MODIFIER LETTER
XYZ" because
of other consistency issues.
>
> As it stands, there's an apparent inconsistency with no apparent purpose.
>
> The best way to start on the path of a remedy for this situation would
> be if you were to file a proposal to the UTC to make these changes.
> That way, this can be discussed and resolved.
Correct, but...
>
> Might as well add the list of Greek characters, submitted by Kent, for
> the record, so they can be resolved as well. (By resolved I here mean
> either have their GC changed or their documentation improved).
I agree, but...
Here is the problem:
Changing these particular gc=Ll subscript modifier letters to gc=Lm
impacts the derived property
Lowercase. In order to keep the repertoire of Lowercase=True stable,
they would then have to
be *added* to the Other_Lowercase property. So an exception in one place
will end up moving
to an exception in another place. True, the resulting exceptionality of
the exceptions is a bit
more uniform, but the overall improvement may be marginal.
But wait, there's more. These kinds of modifier letters are also Cased
(see definition D135), by
virtue of their being Lowercase. And they are not Case_Ignorable (see
definition D136). Moving
them from gc=Ll to gc=Lm would make them Cased (by virtue of their
Lowercase value) and
Case_Ignorable (because they are Lm). I know that is a bit of a
head-bender, but that is how
those properties are defined. Now, it may not actually matter that the
derived Case_Ignorable
property changes for these few subscript modifier letters, because
Case_Ignorable is really
a very narrow use property, just involved in the specification of the
casing context for Greek final sigma.
(See Table 3-15.) Nobody in the real world is going to notice or care
that a few obscure UPA
modifier letters could change a casing context for Greek final sigma,
because nobody uses
them together. But software test engineers don't live in the real world,
and it is conceivable that
test cases could break and somebody complain. Right now there is no
provision for keeping
Case_Ignorable stable for these kinds of one-off general category
property changes -- presumably
because for other than characters actually used with ordinary Greek
letters, it doesn't really
matter that much.
But you have been warned. Tread carefully. ;-)
--Ken
This archive was generated by hypermail 2.1.5 : Mon Jan 31 2011 - 21:02:34 CST