Re: Specification for XID_Start and XID_Continue

From: Mark Davis (mark.davis@icu-project.org)
Date: Wed Aug 15 2007 - 11:02:18 CDT

  • Next message: Martin v. Löwis: "Re: Specification for XID_Start and XID_Continue"

    The reason middle dot wasn't mentioned was that the UTC has decided to add
    it to ID in U5.1 -- see the proposed update at
    http://www.unicode.org/reports/tr31/tr31-8.html. (Middle dot was handled
    specially - instead of removing the character in step #1, the character
    causing a problem in its decomposition was added.)

    The differences can be seen by looking at
    http://unicode.org/cldr/utility/unicodeset.jsp?a=[:id_continue:]&b=[:xid_continue:]
    or
    http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[[[:id_continue:]-[:xid_continue:]][[:xid_continue:]-[:id_continue:]]]

    I think it would be useful to add a more detailed description of the
    derivation; I'll propose that to the editorial committee.

    Mark

    On 8/15/07, "Martin v. Löwis" <martin@v.loewis.de> wrote:
    >
    > > I glean this as the algorithm:
    > >
    > > Add middle dot to ID_CONTINUE
    > >
    > > If an ID_START or ID_CONTINUE character has a decomposition containing a
    > > character other than middle dot that's not in ID_CONTINUE, then remove
    > > that character from ID_START or ID_CONTINUE.
    > >
    > > If an ID_START has a decomposition that begins with a character that's
    > > not an ID_START, remove it from ID_START.
    >
    > Thanks, this is exactly what I was looking for - at least for Unicode
    > 4.1, this algorithm produces an outcome equal to the published tables.
    >
    > Could that be added to UAX#31?
    >
    > Regards,
    > Martin
    >
    >

    -- 
    Mark
    


    This archive was generated by hypermail 2.1.5 : Wed Aug 15 2007 - 11:05:01 CDT