L2/03-107
Re: | Regex Changes |
From: | Mark Davis |
Date: | 2003-03-05 |
Here are document for discussion at the UTC meeting, with (draft) proposed changes for the Reg Ex TR (#18) for:
[93-A14] Action Item for Mark Davis: Look at Unicode Technical Report #18 Unicode Regular Expression Guidelines and propose changes that will make it more referenceable, possibly as a UTS.
1. Post as a proposed UTS and solicit public feedback, adding a conformance section containing the following.
C1. An implementation claiming conformance to Level 1 of this specification shall meet the requirements described in the following sections:
C2. An implementation claiming conformance to Level 2 of this specification shall satisfy C1, and meet the requirements described in the following sections:
C3. An implementation claiming conformance to Level 3 of this specification shall satisfy C1 and C2, and meet the requirements described in the following sections:
C3. An implementation claiming partial conformance to a Level of this specification shall clearly describe all of the criteria for that Level that it does not satisfy.
2. Make the following changes:
A. List a minimal set of character properties for Level 1. (For comparison, see table below.) The following is a suggested list in the proposed update; in the public review we should solicit comments on this area in particular.
B. List out the recommended Unicode equivalents for the common character class names used in old non-Unicode regexps. See Table 5-9 from Programming Perl, 3rd Edition (The O'Reilly book). These are only guidelines.
C. There is a 2.7 Surrogates that requires supplementary character support, but it should be better integrated in the text. In particular, the notation should be unified with 2.1. Hex Notation. The language in a few places needs to be adjusted to the latest glossary usage, in particular, the use of "supplementary characters".
Background