L2/01-317

From: Michel Suignard [michelsu@microsoft.com]
Sent: Tuesday, August 14, 2001 1:23 PM
To: Winkler, Arnold F
Subject: Bracket Disunification & Normalization

August 14th 2001

This is my paper on the bracket disunification and complements the paper WG2 N2345 that was presented at the last WG2 meeting and also clarifies the issues that were brillantly presently by Ken Whistler in his email titled 'Bracket Disunification & Normalization Hell'.

The issue arises mostly from the situation that characters encoded in the range 3000-303F (CJK Symbols and Punctuation) have been used historically for CJK processing, mainly for parenthetical notation. Their origin goes back to terminal display with fixed cell width, and to match the surrounding JK characters their advance width was made 'wide'. This has resulted in very precise typographic guidelines for these characters that are followed by major fonts available in these market.

Although these characters have some glyphic similarities with mathematical characters, they are not intended to be used for that purpose. Their character metric are fundamentally different:

1) They are typically full width character (EM size)
2) They have either a preceding or a following blank space (to emphasize their parenthetical nature), and that blank space can be adjusted in text compression expansion or even simple kerning (without text justification)

3) they use a centered baseline (instead of the low alphabetical baseline used for most other symbols)
4) they participate in their very precise way in vertical writing. Unlike most CJK character, they don't stay upright, but go 'sideway', using an alternate glyph with the blank space located appropriatedly to allow typical blank space management.

For these reasons, these characters are not compatibility characters. Mathematical characters and CJK symbols cannot be canonicalized into each other. They are basically addressing different needs. Furthermore, as pointed by Ken, the logical wide to narrow decomposition (that would parellel the full width character decomposition) cannot be used anymore as it would break normalization.

(It could be argued that some existing wide to narrow decomposition are sometimes used out of bound as the usage of both forms is very different, but this is a debate that is too late to open!)

The author has no strong preference for the addition of extra CJK symbols for the 'double left/right parenthesis'. One set has to be encoded in the Miscellaneous Mathematical Block (29xx range) for mathematical use. A CJK version is probaly required but further study in the expected typographic behavior of the JIS 213 CJK white left/right parenthesis should be done before final decision. This can still be done before the disposition of comment of the FPDAM1.

Following is the list of affected characters and their old and new properties:

Old: GCat = Ps, EAW = A,  Other_Math = Y
New: GCat = Ps, EAW = W,  Other_Math = N
----
2329    LEFT-POINTING ANGLE BRACKET     ==> 3008
3008    LEFT ANGLE BRACKET
301A    LEFT WHITE SQUARE BRACKET

Old: GCat = Ps, EAW = A,  Other_Math = N
New: GCat = Ps, EAW = W,  Other_Math = N
----
300A    LEFT DOUBLE ANGLE BRACKET
3014    LEFT TORTOISE SHELL BRACKET
3018    LEFT WHITE TORTOISE SHELL BRACKET

Old: GCat = Pe, EAW = A,  Other_Math = Y
New: GCat = Pe, EAW = W,  Other_Math = N
----
232A    RIGHT-POINTING ANGLE BRACKET    ==> 3009
3009    RIGHT ANGLE BRACKET
301B    RIGHT WHITE SQUARE BRACKET

Old: GCat = Pe, EAW = A,  Other_Math = N
New: GCat = Pe, EAW = W,  Other_Math = N
----
300B    RIGHT DOUBLE ANGLE BRACKET
3015    RIGHT TORTOISE SHELL BRACKET
3019    RIGHT WHITE TORTOISE SHELL BRACKET

 

The new characters added would have the following properties:

GCat = Ps, EAW = Na,  Other_Math = Y
----
2B00    MATHEMATICAL LEFT WHITE SQUARE BRACKET
2B02    MATHEMATICAL LEFT ANGLE BRACKET
2B04    MATHEMATICAL LEFT DOUBLE ANGLE BRACKET
2985    MATHEMATICAL WHITE LEFT PARENTHESIS (already part of amend. 1)

GCat = Pe, EAW = Na,  Other_Math = Y
----
2B01    MATHEMATICAL RIGHT WHITE SQUARE BRACKET
2B03    MATHEMATICAL RIGHT ANGLE BRACKET
2B05    MATHEMATICAL RIGHT DOUBLE ANGLE BRACKET
2986    MATHEMATICAL WHITE RIGHT PARENTHESIS (already part of amend. 1)

 

If the CJK symbols were added they would have the following properties:

GCat = Ps, EAW = W,  Other_Math = N
----
33DE    WHITE LEFT PARENTHESIS

GCat = Pe, EAW = W,  Other_Math = N
----
33DF    WHITE RIGHT PARENTHESIS

----

 

nbsp;  RIGHT WHITE PARENTHESIS (already part of amend. 1)

GCat = Ps, EAW = F,  Other_Math = N
----
FF5F    FULLWIDTH LEFT WHITE PARENTHESIS

GCat = Pe, EAW = F,  Other_Math = N
----
FF60    FULLWIDTH RIGHT WHITE PARENTHESIS
----