From: karl williamson (public@khwilliamson.com)
Date: Thu Nov 12 2009 - 12:04:45 CST
I realized as I hit send on this that I meant to say lower casing, not
upper casing. Here it is revised:
> I'm thinking about upgrading the Perl 5 language to handle Unicode
> context sensitive language-independent lower casing. The only character
> in this class that is not also language dependent currently is the Greek
> capital sigma, which may be lower cased to a small sigma or a final sigma, depending on context
>
> I'm writing to this list because it has people on it who hopefully have
> some insights that I haven't considered.
>
> What I'm concerned about is what to do when the last non-caseignorable
> character of the string is a Greek capital sigma. The string may actually
> be part of a larger context unavailable to the code at this level.
>
> This can happen when the program that does this is assembling a larger
> string, of which this is a non-terminal component, or more directly,
> with the Perl regular expression syntax 's/(SOMETHING)/ABC\L$1\Edef/'.
> This means that 'something' is to be replaced by 'ABCsomethingdef'. The
> text between the \L and \E is to be lower cased. The casing function
> only sees 'SOMETHING', and currently has no knowledge that there is a
> larger context; it may be hard to change that.
>
> I can see several possibilities:
> 1) Just don't do context sensitive casing, meaning no change from
> current behavior.
>
> 2) Do the context sensitive casing only when there is no ambiguity;
> meaning do nothing at the end of a string.
>
> 3) Assume that the end of a string means that there is no context to
> follow, and go ahead and use the final sigma.
>
> So, I'm wondering if anyone here has insights, or knows what other
> languages have or haven't done with this.
>
> Thank you
>
This archive was generated by hypermail 2.1.5 : Thu Nov 12 2009 - 12:06:40 CST