Re: <<NONCHAR>> for flex

From: Gregg Reynolds (unicode@arabink.com)
Date: Tue Jan 25 2005 - 06:51:00 CST

Next message: Philippe VERDY: "(no subject)"

Previous message: Michael Everson: "New Balinese support document available."
In reply to: Martin Duerst: "<<NONCHAR>> for flex (was: Re: 32'nd bit & UTF-8)"
Next in thread: Hans Aberg: "Re: <<NONCHAR>> for flex"
Maybe reply: Hans Aberg: "Re: <<NONCHAR>> for flex"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Martin Duerst wrote:
>
> What I would expect such an Unicode-enabled version of flex to do
> is to have something similar to <<EOF>>, let's call it <<NONCHAR>>
> for the moment. <<NONCHAR>> would match shortest non-UTF-8 byte
> sequences. The typical use would be for a grammar to have a single
> rule matching <<NONCHAR>>, e.g. like so:
>
> <<NONCHAR>> fprintf(stderr, "Illegal UTF-8 input.\n"); exit(1);
>

Yes; and to go with this I would expect any regex operators to be
defined in terms of characters, so '.' means 'any (well-formed)
character' and does not match ill-formed byte seqs. So the usual
introductory example for flex would include two catch-all rules, one for
chars '.' and one for non-chars '<<NONCHAR>>'.

For <<NONCHAR>> I nominate ☠ (\u2620, skull and crossbones). So the
last lines of the flex spec read:

. copy to stdout
☠ frprint(stderr...(as above)...

-gregg

Next message: Philippe VERDY: "(no subject)"
Previous message: Michael Everson: "New Balinese support document available."
In reply to: Martin Duerst: "<<NONCHAR>> for flex (was: Re: 32'nd bit & UTF-8)"
Next in thread: Hans Aberg: "Re: <<NONCHAR>> for flex"
Maybe reply: Hans Aberg: "Re: <<NONCHAR>> for flex"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Jan 25 2005 - 09:59:09 CST