L2/09-313
From: Mark Davis
Date: August 20, 2009
Subject:  BN Values

====

While fixing UBA (#9) for Mati's comments, we fixed the textual description for BN. As it turns out, the BN characters almost match

[\p{di}\p{nchar}\p{cc}-\p{M}-\p{Bidi_C}-\p{alpha}-\p{wspace}].

In English, that is {default-ignorables, noncharacters, and controls, minus marks, bidi-controls, alphabetic, and whitespace}.

The one character outstanding BN but not in the above set is:

U+070F ( ܏ ) SYRIAC ABBREVIATION MARK

I suspect that this is an oversight, and that I should propose changing it (for Unicode 6.0, of course). The issue is that the BIDI algorithm specifically excludes positioning of BN, so formally this could end up anywhere in the string. Anyone see an obvious reason why it shouldn't be changed from BN?

The 6 characters in the other set but not in BN are the following. I suspect that the 1C..1F are old holdovers. The Khmer characters are odd, but deprecated. At this point I don't see a reason to change any of these, but thought I'd pass around for comments.

Basic Latin - C0 controls

U+001C ( � ) INFORMATION SEPARATOR FOUR
U+001D ( � ) INFORMATION SEPARATOR THREE
U+001E ( � ) INFORMATION SEPARATOR TWO
U+001F ( � ) INFORMATION SEPARATOR ONE

Khmer - Inherent vowels

U+17B4 (  ) KHMER VOWEL INHERENT AQ
U+17B5 (  ) KHMER VOWEL INHERENT AA