L2/07-029
Source: Mark Davis
Date: 2007-01-21
Subject: UTS #18 update
I had the action to do:
103-A62 |
Mark Davis |
Issue a proposed update to UTS #18 (document revision 12) that addresses the issue in L2/05-121. |
That document has:
1. Enabling the 'any character' to be newline sequence aware when it is in not in a mode that requires it stops at newline sequences (what I refer to as the default mode) should be optional in the standard.
2. The document could propose some means to match 'any character' while treating newline sequences as a single character but advises against using the standard 'any character' meta character (dot) to achieve this purpose and suggests (mandates?) introducing a new meta character.
I believe that this is already done in the existing version of #18, with the following two paragraphs:
It is strongly recommended that there be a regular expression
meta-character, such as "\R", for matching all line ending characters
and sequences listed above (e.g. in #1). It would thus be shorthand
for
([\u000A\u000B\u000C\u000D\u0085\u2028\u2029] | \u000D\u000A).
Note: For some implementations, there may be a performance impact in recognizing CRLF as a single entity, such as with an arbitrary pattern character ("."). To account for that, an implementation may satisfy R1.6 if there is a mechanism available for converting the sequence CRLF to a single line boundary character before regex processing.
Thus I believe this action should count as done. However, I wanted to bring this issue to the UTC's attention in case there is disagreement on this issue.
(Note: I have a draft update of #18, but it currently only has minor editing fixes for items noted by Julie and Asmus, so I wouldn't recommend issuing a new version until there is something more substantial.)