Re: Specification for XID_Start and XID_Continue

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Tue Aug 14 2007 - 14:30:30 CDT

Next message: Martin v. Löwis: "Re: Specification for XID_Start and XID_Continue"

Previous message: Mike: "Re: Specification for XID_Start and XID_Continue"
In reply to: Martin v. Löwis: "Specification for XID_Start and XID_Continue"
Next in thread: Martin v. Löwis: "Re: Specification for XID_Start and XID_Continue"
Reply: Martin v. Löwis: "Re: Specification for XID_Start and XID_Continue"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 8/14/2007 9:50 AM, Martin v. Löwis wrote:
> I'm trying to locate the precise specification for the
> XID_Start and XID_Continue properties. According to
>
> http://unicode.org/Public/UNIDATA/UCD.html
>
> they are derived properties, so there should be an
> algorithm somewhere describing how the are computed
> (given other properties). The UCD says that the
> specification is in UAX#31, which says I should
> read
>
> http://unicode.org/reports/tr31/#NFKC_Modifications
>
> However, looking at 5.1, I cannot find a precise
> specification of these properties. For example,
> 5.1.2 says "Certain characters...", but does not
> seem to provide a complete list of such characters.
> It ends with "In particular, the following four
> characters...". Again, that reads like an example -
> is it meant as a complete specification?
>
> Likewise, 5.1.3 talks about "certain Arabic presentation
> forms", without giving a complete list which precisely
> are excluded from XID_Start and XID_Continue.
>
> Any insights appreciated,
>
I think the algorithm you are looking for is given by the requirement that

IsIdentifier(S) == IsIdentifier(NFKx(S))

and the desire to not add characters to ID_CONTINUE that impact
processing of identifiers, except where really necessary (middle dot).

I glean this as the algorithm:

Add middle dot to ID_CONTINUE

If an ID_START or ID_CONTINUE character has a decomposition containing a
character other than middle dot that's not in ID_CONTINUE, then remove
that character from ID_START or ID_CONTINUE.

If an ID_START has a decomposition that begins with a character that's
not an ID_START, remove it from ID_START.

A./

> Martin
>
>
>
>

Next message: Martin v. Löwis: "Re: Specification for XID_Start and XID_Continue"
Previous message: Mike: "Re: Specification for XID_Start and XID_Continue"
In reply to: Martin v. Löwis: "Specification for XID_Start and XID_Continue"
Next in thread: Martin v. Löwis: "Re: Specification for XID_Start and XID_Continue"
Reply: Martin v. Löwis: "Re: Specification for XID_Start and XID_Continue"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Aug 14 2007 - 14:32:50 CDT