Public Review Issue #28

BIDI Boundary_Neutral Property Value.

The BIDI property value BN is currently aligned with the General Category Value Format_Character (Cf), minus, of course, the BIDI specific format characters (LRM, RLM, RLE, LRE, RLO, LRO, PDF). The intent of the BN property is to allow the BIDI algorithm to ignore invisible, irrelevant characters when determining the ordering of the visible characters.

The Default_Ignorable_Code_Point property (DICP), which was developed after the original BIDI algorithm, is designed to capture that intent much more accurately than the Cf property value. It also specifically includes ranges of unassigned codes that are reserved for future format characters, thus allowing more forward compatibility. The proposal is to align the BN property with DICP instead of Cf, minus again the BIDI specific characters.

For comparison, here is a listing of the differences between the two properties.

In Bidi_Class=Boundary_Neutral, but not in Default_Ignorable_Code_Point:

U+FFF9..U+FFFB # INTERLINEAR ANNOTATION ANCHOR..INTERLINEAR ANNOTATION TERMINATOR

Total: 3

Not in Bidi_Class=Boundary_Neutral, but in (Default_Ignorable_Code_Point - BIDI_Specifics):

U+001C..U+001F # <INFORMATION SEPARATOR FOUR>..<INFORMATION SEPARATOR ONE>
U+00AD # SOFT HYPHEN
U+034F # COMBINING GRAPHEME JOINER
U+115F..U+1160 # HANGUL CHOSEONG FILLER..HANGUL JUNGSEONG FILLER
U+17B4..U+17B5 # KHMER VOWEL INHERENT AQ..KHMER VOWEL INHERENT AA
U+180B..U+180D # MONGOLIAN FREE VARIATION SELECTOR ONE..MONGOLIAN FREE VARIATION SELECTOR THREE
U+2064..U+2069 # <unassigned-2064>..<unassigned-2069>
U+3164 # HANGUL FILLER
U+FDD0..U+FDEF # <noncharacter-FDD0>..<noncharacter-FDEF>
U+FE00..U+FE0F # VARIATION SELECTOR-1..VARIATION SELECTOR-16
U+FFA0 # HALFWIDTH HANGUL FILLER
U+FFF0..U+FFF8 # <unassigned-FFF0>..<unassigned-FFF8>
U+FFFE..U+FFFF # <noncharacter-FFFE>..<noncharacter-FFFF>
U+1FFFE..U+1FFFF # <noncharacter-1FFFE>..<noncharacter-1FFFF>
U+2FFFE..U+2FFFF # <noncharacter-2FFFE>..<noncharacter-2FFFF>
U+3FFFE..U+3FFFF # <noncharacter-3FFFE>..<noncharacter-3FFFF>
U+4FFFE..U+4FFFF # <noncharacter-4FFFE>..<noncharacter-4FFFF>
U+5FFFE..U+5FFFF # <noncharacter-5FFFE>..<noncharacter-5FFFF>
U+6FFFE..U+6FFFF # <noncharacter-6FFFE>..<noncharacter-6FFFF>
U+7FFFE..U+7FFFF # <noncharacter-7FFFE>..<noncharacter-7FFFF>
U+8FFFE..U+8FFFF # <noncharacter-8FFFE>..<noncharacter-8FFFF>
U+9FFFE..U+9FFFF # <noncharacter-9FFFE>..<noncharacter-9FFFF>
U+AFFFE..U+AFFFF # <noncharacter-AFFFE>..<noncharacter-AFFFF>
U+BFFFE..U+BFFFF # <noncharacter-BFFFE>..<noncharacter-BFFFF>
U+CFFFE..U+CFFFF # <noncharacter-CFFFE>..<noncharacter-CFFFF>
U+DFFFE..U+E0000 # <noncharacter-DFFFE>..<unassigned-E0000>
U+E0002..U+E001F # <unassigned-E0002>..<unassigned-E001F>
U+E0080..U+E0FFF # <unassigned-E0080>..<unassigned-E0FFF>
U+EFFFE..U+EFFFF # <noncharacter-EFFFE>..<noncharacter-EFFFF>
U+FFFFE..U+FFFFF # <noncharacter-FFFFE>..<noncharacter-FFFFF>
U+10FFFE..U+10FFFF # <noncharacter-10FFFE>..<noncharacter-10FFFF>

Total: 6,171

In both Bidi_Class=Boundary_Neutral and Default_Ignorable_Code_Point:

[\u0000-\u0008\u000E-\u001B\u007F-\u0084\u0086-\u009F\u070F\u200B-\u200D\u2060-\
u2063\u206A-\u206F\uFEFF\U0001D173-\U0001D17A\U000E0001\U000E0020-\U000E007F]

Bidi_Specifics

U+200E..U+200F # LEFT-TO-RIGHT MARK..RIGHT-TO-LEFT MARK
U+202A..U+202E # LEFT-TO-RIGHT EMBEDDING..RIGHT-TO-LEFT OVERRIDE
U+0600..U+0603 # ARABIC NUMBER SIGN..ARABIC SIGN SAFHA
U+06DD # ARABIC END OF AYAH

Note:

The following characters seem to have an anomalous BIDI property. There seems to be no good reason that those are Bidi_Class = L instead of BN. The characters are discouraged anyway, so it shouldn't matter.

17B4..17B5    ; L # Cf   [2] KHMER VOWEL INHERENT AQ..KHMER VOWEL INHERENT AA