Eric Muller, Adobe
July 26, 2014
In Unicode 7.0, there is a linebreak opportunity between ? or ! and U+2026 … HORIZONTAL ELLIPSIS. It is not uncommon to encounter these sequences and the break opportunity is highly undesirable.
Both ? and ! have the lb property value EX (exclamation), and the other characters with that property are very similar (see below).
… has the lb property value IN (inseparable), and the other characters with that property are very similar (see below).
So it seems that preventing breaks betwen EX and IN is a good way to solve the problem (by opposition to giving other property values to ?, ! or …).
Rule LB22 is:
Do not break between two ellipses, or between letters or numbers and ellipsis. (AL | HL) × IN ID × IN IN × IN NU × IN
I propose to extend this rule by adding EX x IN to it, and change "letters or numbers" to "letters, numbers or exclamations".
I suspect that breaking before IN should be the exception rather than the norm, and that many more categories should be included here, but the proposal is only to include EX.
Other characters with IN:
General Punctuation — General punctuation items: 3 U+2024 ( ․ ) ONE DOT LEADER U+2025 ( ‥ ) TWO DOT LEADER U+2026 ( … ) HORIZONTAL ELLIPSIS Vertical Forms — Glyphs for vertical variants items: 1 U+FE19 ( ︙ ) PRESENTATION FORM FOR VERTICAL HORIZONTAL ELLIPSIS Manichaean — Punctuation items: 1 U+10AF6 ( 𐫶 ) MANICHAEAN PUNCTUATION LINE FILLER
Other characters with EX:
Basic Latin — ASCII punctuation and symbols items: 2 U+0021 ( ! ) EXCLAMATION MARK U+003F ( ? ) QUESTION MARK Hebrew — Points and punctuation items: 1 U+05C6 ( ׆ ) HEBREW PUNCTUATION NUN HAFUKHA Arabic — Punctuation items: 4 U+061B ( ؛ ) ARABIC SEMICOLON U+061E ( ؞ ) ARABIC TRIPLE DOT PUNCTUATION MARK U+061F ( ؟ ) ARABIC QUESTION MARK U+06D4 ( ۔ ) ARABIC FULL STOP NKo — Punctuation items: 1 U+07F9 ( ߹ ) NKO EXCLAMATION MARK Tibetan — Sign items: 6 U+0F0D ( ། ) TIBETAN MARK SHAD U+0F0E ( ༎ ) TIBETAN MARK NYIS SHAD U+0F0F ( ༏ ) TIBETAN MARK TSHEG SHAD U+0F10 ( ༐ ) TIBETAN MARK NYIS TSHEG SHAD U+0F11 ( ༑ ) TIBETAN MARK RIN CHEN SPUNGS SHAD U+0F14 ( ༔ ) TIBETAN MARK GTER TSHEG Mongolian — Punctuation items: 4 U+1802 ( ᠂ ) MONGOLIAN COMMA U+1803 ( ᠃ ) MONGOLIAN FULL STOP U+1808 ( ᠈ ) MONGOLIAN MANCHU COMMA U+1809 ( ᠉ ) MONGOLIAN MANCHU FULL STOP Limbu — Various signs items: 2 U+1944 ( ᥄ ) LIMBU EXCLAMATION MARK U+1945 ( ᥅ ) LIMBU QUESTION MARK Dingbats — Punctuation mark ornaments items: 2 U+2762 ( ❢ ) HEAVY EXCLAMATION MARK ORNAMENT U+2763 ( ❣ ) HEAVY HEART EXCLAMATION MARK ORNAMENT Coptic — Old Nubian punctuation items: 1 U+2CF9 ( ⳹ ) COPTIC OLD NUBIAN FULL STOP Coptic — Punctuation items: 1 U+2CFE ( ⳾ ) COPTIC FULL STOP Supplemental Punctuation — Archaic punctuation items: 1 U+2E2E ( ⸮ ) REVERSED QUESTION MARK Vai — Punctuation items: 1 U+A60E ( ꘎ ) VAI FULL STOP Phags Pa — Punctuation for Tibetan items: 2 U+A876 ( ꡶ ) PHAGS-PA MARK SHAD U+A877 ( ꡷ ) PHAGS-PA MARK DOUBLE SHAD Vertical Forms — Glyphs for vertical variants items: 2 U+FE15 ( ︕ ) PRESENTATION FORM FOR VERTICAL EXCLAMATION MARK U+FE16 ( ︖ ) PRESENTATION FORM FOR VERTICAL QUESTION MARK Small Form Variants — Small form variants items: 2 U+FE56 ( ﹖ ) SMALL QUESTION MARK U+FE57 ( ﹗ ) SMALL EXCLAMATION MARK Halfwidth And Fullwidth Forms — Fullwidth ASCII variants items: 2 U+FF01 ( ! ) FULLWIDTH EXCLAMATION MARK U+FF1F ( ? ) FULLWIDTH QUESTION MARK Siddham — Punctuation items: 2 U+115C4 ( 𑗄 ) SIDDHAM SEPARATOR DOT U+115C5 ( 𑗅 ) SIDDHAM SEPARATOR BAR