RE: Question about \uxxxx etc. for 21-bit code points - need advi ce

From: jarkko.hietaniemi@nokia.com
Date: Tue May 23 2000 - 15:58:27 EDT


> -----Original Message-----
> From: EXT Markus Scherer [mailto:markus.scherer@jtcsv.com]
> Sent: Tuesday, May 23, 2000 3:36 PM
> To: Unicode List
> Subject: Re: Question about \uxxxx etc. for 21-bit code points - need
> advice
>
>
> jarkko.hietaniemi@nokia.com wrote:
> > Uhhh...in what context do you propose this? At least in
> Perl context
> > it would conflict really badly as \w is a reserved regular
> expression
> > notation: a character class matching alphanumerics plus underscore.
> >
> > --
> > Jarkko Hietaniemi <jarkko.hietaniemi@nokia.com>
>
> Well, the context would be to have test strings and resource
> bundle entries with such numeric character references.
> Example:
> resourceTag "NBSP \xa0, Euro \u20ac, and some plane 1 code
> point \w010330."
>
> What does Perl do?
>
> markus

In Perl regular expressions are (most often) in "double-quoted context",
meaning
that variables are expanded.

$a = 'x\w20acz';
print "yes\n" if "fooxy20aczbar" =~ /foo${a}bar/;

This will output "yes", because the \w matches the 'y', and even though
there
is no euro after the 'x' in the constant string being matched...

If the context is C, the suggested \x{yyyy} would be the nicest: because it
would
be easily extensible, it's because unambiguous, and because Perl already
does the same :-)

-- 
Jarkko Hietaniemi <jarkko.hietaniemi@nokia.com>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:03 EDT