L2/01-317
From: Michel Suignard [michelsu@microsoft.com]
Sent: Tuesday, August 14, 2001 1:23 PM
To: Winkler, Arnold F
Subject: Bracket Disunification & Normalization
August 14th 2001
This is my paper on the bracket disunification and complements the paper WG2 N2345 that was presented at the last WG2 meeting and also clarifies the issues that were brillantly presently by Ken Whistler in his email titled 'Bracket Disunification & Normalization Hell'.
The issue arises mostly from the situation that characters encoded in the range 3000-303F (CJK Symbols and Punctuation) have been used historically for CJK processing, mainly for parenthetical notation. Their origin goes back to terminal display with fixed cell width, and to match the surrounding JK characters their advance width was made 'wide'. This has resulted in very precise typographic guidelines for these characters that are followed by major fonts available in these market.
Although these characters have some glyphic similarities with mathematical characters, they are not intended to be used for that purpose. Their character metric are fundamentally different:
1) They are typically full width character (EM size)
2) They have either a preceding or a following blank space (to emphasize their
parenthetical nature), and that blank space can be adjusted in text compression
expansion or even simple kerning (without text justification)
3) they use a centered baseline (instead of the low alphabetical baseline
used for most other symbols)
4) they participate in their very precise way in vertical writing. Unlike most
CJK character, they don't stay upright, but go 'sideway', using an alternate
glyph with the blank space located appropriatedly to allow typical blank space
management.
For these reasons, these characters are not compatibility characters. Mathematical characters and CJK symbols cannot be canonicalized into each other. They are basically addressing different needs. Furthermore, as pointed by Ken, the logical wide to narrow decomposition (that would parellel the full width character decomposition) cannot be used anymore as it would break normalization.
(It could be argued that some existing wide to narrow decomposition are sometimes used out of bound as the usage of both forms is very different, but this is a debate that is too late to open!)
The author has no strong preference for the addition of extra CJK symbols for the 'double left/right parenthesis'. One set has to be encoded in the Miscellaneous Mathematical Block (29xx range) for mathematical use. A CJK version is probaly required but further study in the expected typographic behavior of the JIS 213 CJK white left/right parenthesis should be done before final decision. This can still be done before the disposition of comment of the FPDAM1.
Following is the list of affected characters and their old and new properties:
Old: GCat = Ps, EAW = A, Other_Math = Y
New: GCat = Ps, EAW = W, Other_Math = N
----
2329 LEFT-POINTING ANGLE BRACKET
==> 3008
3008 LEFT ANGLE BRACKET
301A LEFT WHITE SQUARE BRACKET
Old: GCat = Ps, EAW = A, Other_Math = N
New: GCat = Ps, EAW = W, Other_Math = N
----
300A LEFT DOUBLE ANGLE BRACKET
3014 LEFT TORTOISE SHELL BRACKET
3018 LEFT WHITE TORTOISE SHELL BRACKET
Old: GCat = Pe, EAW = A, Other_Math = Y
New: GCat = Pe, EAW = W, Other_Math = N
----
232A RIGHT-POINTING ANGLE BRACKET ==>
3009
3009 RIGHT ANGLE BRACKET
301B RIGHT WHITE SQUARE BRACKET
Old: GCat = Pe, EAW = A, Other_Math = N
New: GCat = Pe, EAW = W, Other_Math = N
----
300B RIGHT DOUBLE ANGLE BRACKET
3015 RIGHT TORTOISE SHELL BRACKET
3019 RIGHT WHITE TORTOISE SHELL BRACKET
The new characters added would have the following properties:
GCat = Ps, EAW = Na, Other_Math = Y
----
2B00 MATHEMATICAL LEFT WHITE SQUARE BRACKET
2B02 MATHEMATICAL LEFT ANGLE BRACKET
2B04 MATHEMATICAL LEFT DOUBLE ANGLE BRACKET
2985 MATHEMATICAL WHITE LEFT PARENTHESIS (already part of
amend. 1)
GCat = Pe, EAW = Na, Other_Math = Y
----
2B01 MATHEMATICAL RIGHT WHITE SQUARE BRACKET
2B03 MATHEMATICAL RIGHT ANGLE BRACKET
2B05 MATHEMATICAL RIGHT DOUBLE ANGLE BRACKET
2986 MATHEMATICAL WHITE RIGHT PARENTHESIS (already part of
amend. 1)
If the CJK symbols were added they would have the following properties:
GCat = Ps, EAW = W, Other_Math = N
----
33DE WHITE LEFT PARENTHESIS
GCat = Pe, EAW = W, Other_Math = N
----
33DF WHITE RIGHT PARENTHESIS
----