L2/14-168

 

Linebreaking before IN

Eric Muller, Adobe

July 26, 2014

 

 

In Unicode 7.0, there is a linebreak opportunity between ? or ! and U+2026 … HORIZONTAL ELLIPSIS. It is not uncommon to encounter these sequences and the break opportunity is highly undesirable.

Both ? and ! have the lb property value EX (exclamation), and the other characters with that property are very similar (see below).

… has the lb property value IN (inseparable), and the other characters with that property are very similar (see below).

So it seems that preventing breaks betwen EX and IN is a good way to solve the problem (by opposition to giving other property values to ?, ! or …).

Rule LB22 is:

Do not break between two ellipses, or between letters or numbers and ellipsis.

(AL | HL) × IN
ID × IN
IN × IN
NU × IN

I propose to extend this rule by adding EX x IN to it, and change "letters or numbers" to "letters, numbers or exclamations".

I suspect that breaking before IN should be the exception rather than the norm, and that many more categories should be included here, but the proposal is only to include EX.

Other characters with IN:

General Punctuation — General punctuation
items: 3
U+2024 ( ․ ) ONE DOT LEADER
U+2025 ( ‥ ) TWO DOT LEADER
U+2026 ( … ) HORIZONTAL ELLIPSIS
Vertical Forms — Glyphs for vertical variants
items: 1
U+FE19 ( ︙ ) PRESENTATION FORM FOR VERTICAL HORIZONTAL ELLIPSIS
Manichaean — Punctuation
items: 1
U+10AF6 ( ‎𐫶‎ ) MANICHAEAN PUNCTUATION LINE FILLER

Other characters with EX:

Basic Latin — ASCII punctuation and symbols
items: 2
U+0021 ( ! ) EXCLAMATION MARK
U+003F ( ? ) QUESTION MARK
Hebrew — Points and punctuation
items: 1
U+05C6 ( ‎׆‎ ) HEBREW PUNCTUATION NUN HAFUKHA
Arabic — Punctuation
items: 4
U+061B ( ‎؛‎ ) ARABIC SEMICOLON
U+061E ( ‎؞‎ ) ARABIC TRIPLE DOT PUNCTUATION MARK
U+061F ( ‎؟‎ ) ARABIC QUESTION MARK
U+06D4 ( ‎۔‎ ) ARABIC FULL STOP
NKo — Punctuation
items: 1
U+07F9 ( ߹ ) NKO EXCLAMATION MARK
Tibetan — Sign
items: 6
U+0F0D ( ། ) TIBETAN MARK SHAD
U+0F0E ( ༎ ) TIBETAN MARK NYIS SHAD
U+0F0F ( ༏ ) TIBETAN MARK TSHEG SHAD
U+0F10 ( ༐ ) TIBETAN MARK NYIS TSHEG SHAD
U+0F11 ( ༑ ) TIBETAN MARK RIN CHEN SPUNGS SHAD
U+0F14 ( ༔ ) TIBETAN MARK GTER TSHEG
Mongolian — Punctuation
items: 4
U+1802 ( ᠂ ) MONGOLIAN COMMA
U+1803 ( ᠃ ) MONGOLIAN FULL STOP
U+1808 ( ᠈ ) MONGOLIAN MANCHU COMMA
U+1809 ( ᠉ ) MONGOLIAN MANCHU FULL STOP
Limbu — Various signs
items: 2
U+1944 ( ᥄ ) LIMBU EXCLAMATION MARK
U+1945 ( ᥅ ) LIMBU QUESTION MARK
Dingbats — Punctuation mark ornaments
items: 2
U+2762 ( ❢ ) HEAVY EXCLAMATION MARK ORNAMENT
U+2763 ( ❣ ) HEAVY HEART EXCLAMATION MARK ORNAMENT
Coptic — Old Nubian punctuation
items: 1
U+2CF9 ( ⳹ ) COPTIC OLD NUBIAN FULL STOP
Coptic — Punctuation
items: 1
U+2CFE ( ⳾ ) COPTIC FULL STOP
Supplemental Punctuation — Archaic punctuation
items: 1
U+2E2E ( ⸮ ) REVERSED QUESTION MARK
Vai — Punctuation
items: 1
U+A60E ( ꘎ ) VAI FULL STOP
Phags Pa — Punctuation for Tibetan
items: 2
U+A876 ( ꡶ ) PHAGS-PA MARK SHAD
U+A877 ( ꡷ ) PHAGS-PA MARK DOUBLE SHAD
Vertical Forms — Glyphs for vertical variants
items: 2
U+FE15 ( ︕ ) PRESENTATION FORM FOR VERTICAL EXCLAMATION MARK
U+FE16 ( ︖ ) PRESENTATION FORM FOR VERTICAL QUESTION MARK
Small Form Variants — Small form variants
items: 2
U+FE56 ( ﹖ ) SMALL QUESTION MARK
U+FE57 ( ﹗ ) SMALL EXCLAMATION MARK
Halfwidth And Fullwidth Forms — Fullwidth ASCII variants
items: 2
U+FF01 ( ! ) FULLWIDTH EXCLAMATION MARK
U+FF1F ( ? ) FULLWIDTH QUESTION MARK
Siddham — Punctuation
items: 2
U+115C4 ( 𑗄 ) SIDDHAM SEPARATOR DOT
U+115C5 ( 𑗅 ) SIDDHAM SEPARATOR BAR