From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Tue Aug 14 2007 - 14:30:30 CDT
On 8/14/2007 9:50 AM, Martin v. Löwis wrote:
> I'm trying to locate the precise specification for the
> XID_Start and XID_Continue properties. According to
>
> http://unicode.org/Public/UNIDATA/UCD.html
>
> they are derived properties, so there should be an
> algorithm somewhere describing how the are computed
> (given other properties). The UCD says that the
> specification is in UAX#31, which says I should
> read
>
> http://unicode.org/reports/tr31/#NFKC_Modifications
>
> However, looking at 5.1, I cannot find a precise
> specification of these properties. For example,
> 5.1.2 says "Certain characters...", but does not
> seem to provide a complete list of such characters.
> It ends with "In particular, the following four
> characters...". Again, that reads like an example -
> is it meant as a complete specification?
>
> Likewise, 5.1.3 talks about "certain Arabic presentation
> forms", without giving a complete list which precisely
> are excluded from XID_Start and XID_Continue.
>
> Any insights appreciated,
>
I think the algorithm you are looking for is given by the requirement that
IsIdentifier(S) == IsIdentifier(NFKx(S))
and the desire to not add characters to ID_CONTINUE that impact
processing of identifiers, except where really necessary (middle dot).
I glean this as the algorithm:
Add middle dot to ID_CONTINUE
If an ID_START or ID_CONTINUE character has a decomposition containing a
character other than middle dot that's not in ID_CONTINUE, then remove
that character from ID_START or ID_CONTINUE.
If an ID_START has a decomposition that begins with a character that's
not an ID_START, remove it from ID_START.
A./
> Martin
>
>
>
>
This archive was generated by hypermail 2.1.5 : Tue Aug 14 2007 - 14:32:50 CDT