L2/05-197
Re: | Suggested changes in the Character Property Model (#23) |
From: | Mark Davis |
Date: | 2007.07.28 |
I suggest the following changes in the Character Property Model (http://www.unicode.org/reports/tr23/).
1. To make the definitions more accessible, supply an example for each one. This can be very short, just a sentence.
2. Make the following additional changes. Some of these are corrections, some clarifications, and some additional definitions that are useful in discussing Unicode properties and implementations.
Current | Suggested addition |
|
For a given boolean property P, the phrase "the P code points" denotes the set of all code points whose property value for P is 'true'. For example, the Pattern_Whitepace code points are those with the Pattern_Whitespace property value 'true'. Similarly, for a given property P and value V, the phrase "the P V code points" denotes the set of all code points whose property value for P is V. For example, the Line Break Alphabetic code points are all for which the Line Break property value is Alphabetic. |
Current | Suggested replacement |
|
For example, the character property Final_Sigma used in Table 3-13 depends on characters before and after the character in question.
|
Current | Suggested addition |
|
[Add note:]
|
Current | Suggested addition |
|
|
Current | Suggested addition |
3.7 Classification of String Functions |
|
Current | Suggested replacement [plus reordering, as above] |
|
|
Current | Suggested replacement |
|
|
Suggested additions (not necessarily in this order) |
Under certain conditions, strings and boundaries are "inert" with respect to a given transform. This property can often be used in optimizing code, by skipping over characters or detecting conditions where fast paths can be taken in code. Examples: with respect to NFD, the character 'a' is inert. The <combining diaeresis> is not, since toNFD(<combining diaeresis>, <combining cedilla>) Implementations can often use tests for inert characters in optimizing. PDx2. Inert Boundary
For example, these properties can be used for an optimized normalization concatenation. Normal string concatenation does not preserve normalization. Thus the concatenation of two normalized strings A and B is not guaranteed to be normalized. However, it is easy to write an optimized normalized concatenation by breaking A into two parts A' and A" (where A' ends with the last final-inert character in A), and breaking B into two parts B' and B" (where B" starts with the first initial-inert character in B), then returning A' + normalize(A" + B') + B".
|