Re: <<NONCHAR>> for flex

From: Gregg Reynolds (unicode@arabink.com)
Date: Tue Jan 25 2005 - 06:51:00 CST

  • Next message: Philippe VERDY: "(no subject)"

    Martin Duerst wrote:
    >
    > What I would expect such an Unicode-enabled version of flex to do
    > is to have something similar to <<EOF>>, let's call it <<NONCHAR>>
    > for the moment. <<NONCHAR>> would match shortest non-UTF-8 byte
    > sequences. The typical use would be for a grammar to have a single
    > rule matching <<NONCHAR>>, e.g. like so:
    >
    > <<NONCHAR>> fprintf(stderr, "Illegal UTF-8 input.\n"); exit(1);
    >

    Yes; and to go with this I would expect any regex operators to be
    defined in terms of characters, so '.' means 'any (well-formed)
    character' and does not match ill-formed byte seqs. So the usual
    introductory example for flex would include two catch-all rules, one for
    chars '.' and one for non-chars '<<NONCHAR>>'.

    For <<NONCHAR>> I nominate ☠ (\u2620, skull and crossbones). So the
    last lines of the flex spec read:

    . copy to stdout
    ☠ frprint(stderr...(as above)...

    -gregg



    This archive was generated by hypermail 2.1.5 : Tue Jan 25 2005 - 09:59:09 CST