At 13:07 97-02-03 -0800, Mark Davis wrote:
>I realize as you do that filtering unknown characters is a problem.
And filtering may be due to all kinds of things, including hardwired range
processing.
>However, I think you are missing my point. In regular expressions, you
>are producing a pattern that will match certain characters. Rather than
>list them all, there is a shorthand that people use, which is to list
>ranges of code points. My point is:
>For this *particular* application, usually when people list "a-z", they
>really mean "Latin Letters", or often, just "Letters".
Then if they mean so they might miss the thorns (��), whence my point. If
they mean letters then they should not use a hard-wired "range" function.
>The latter is
>actually usually BETTER for the problems that you list than restricting
>it to a particular range for a particular language.
We are on the same wave length on this.
>Even better would be to look at common practice and separate out *more*
>higher level divisions, such as "Vowel" which often arise in regular
>expressions.
Even vowels may vary from language to language: hence y is *always* a vowel
in French, w is *never* one... that has to be localized too... (;
>However, there are times where software does only recognize certain
>letters, and has to be able to do so. A C compiler, unlike Java, doesn't
>allow accented letters in identifiers. If you have to mimic that
>behavior, then you want to use a precise description of the characters.
Even this practice of programming languages is questionable and was made for
English-speaking programmers only... but that is another debate... I don't
want to enter into antediluvian debates... Let's just conclude that new
programming languages should not reproduce those bad-taste and parochial
flaws... (;
Alain LaBont�
Qu�bec
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:33 EDT