L2/09-313From: Mark Davis
Date: August 20, 2009
Subject:
BN Values
====
While fixing UBA (#9) for Mati's comments, we
fixed the textual description for BN. As it turns out, the
BN characters
almost match
[\p{di}\p{nchar}\p{cc}-\p{M}-\p{Bidi_C}-\p{alpha}-\p{wspace}].
In English, that is {default-ignorables, noncharacters, and
controls,
minus marks, bidi-controls, alphabetic, and
whitespace}.
The one character outstanding
BN
but not in the above set is:
U+070F
( ) SYRIAC ABBREVIATION MARK
I suspect that
this is an oversight, and that I should propose changing it (for Unicode
6.0, of course). The issue is that the BIDI algorithm specifically
excludes positioning of BN, so formally this could end up anywhere in
the string. Anyone see an obvious reason why it shouldn't be
changed from BN?
The 6 characters in the
other set but not in
BN are the following. I
suspect that the 1C..1F are old holdovers. The Khmer characters are odd,
but deprecated. At this point I don't see a reason to change any of
these, but thought I'd pass around for comments.
Basic Latin - C0 controls
U+001C
( � ) INFORMATION SEPARATOR FOUR
U+001D
( � ) INFORMATION SEPARATOR THREE
U+001E
( � ) INFORMATION SEPARATOR TWO
U+001F
( � ) INFORMATION SEPARATOR ONE
Khmer - Inherent vowels
U+17B4
( ) KHMER VOWEL INHERENT AQ
U+17B5
( ) KHMER VOWEL INHERENT AA