From: karl williamson (public@khwilliamson.com)
Date: Thu Nov 12 2009 - 11:32:25 CST
I'm thinking about upgrading the Perl 5 language to handle Unicode
context sensitive language-independent upper casing. The only character
in this class that is not also language dependent currently is the Greek
final sigma.
I'm writing to this list because it has people on it who hopefully have
some insights that I haven't considered.
What I'm concerned about is what to do when the last non-caseignorable
character of the string is a Greek small sigma. The string may actually
be part of a larger context unavailable to the code at this level.
This can happen when the program that does this is assembling a larger
string, of which this is a non-terminal component, or more directly,
with the Perl regular expression syntax 's/(something)/abc\U$1\Edef/'.
This means that 'something' is to be replaced by 'abcSOMETHINGdef'. The
text between the \U and \E is to be upper cased. The casing function
only sees 'something', and currently has no knowledge that there is a
larger context; it may be hard to change that.
I can see several possibilities:
1) Just don't do context sensitive casing, meaning no change from
current behavior.
2) Do the context sensitive casing only when there is no ambiguity;
meaning do nothing at the end of a string.
3) Assume that the end of a string means that there is no context to
follow, and go ahead and use the final sigma.
So, I'm wondering if anyone here has insights, or knows what other
languages have or haven't done with this.
Thank you
This archive was generated by hypermail 2.1.5 : Thu Nov 12 2009 - 11:36:00 CST