L2/07-384
Date: Mon, 15 Oct 2007 13:57:59 -0700
From: Andy Heninger
Subject: UAX 14, shorten lists of characters
A late UTC agenda item. (If it's too late, postpone it until next time)
UAX-14 includes complete lists of characters for many of the line breaking
classes. I propose that, in cases where these lists contain more than a few
characters, that they be replaced by a few representative characters from
the class, together with text referring to the data file for the complete
list.
The issue is that maintaining the data in parallel between UAX-14 and the
LineBreak.txt data file is a potentially error-prone process that does not
seem to add much value, and can potentially cause confusion regarding which
lists are normative. The data file is, and remains, normative.
Any individual characters that are specifically discussed in the text would
want to remain listed.
Here are lists copied out of TR14 that could reasonably be shortened.
Breaking Spaces
1680 OGHAM SPACE MARK
2000 EN QUAD
2001 EM QUAD
2002 EN SPACE
2003 EM SPACE
2004 THREE-PER-EM SPACE
2005 FOUR-PER-EM SPACE
2006 SIX-PER-EM SPACE
2008 PUNCTUATION SPACE
2009 THIN SPACE
200A HAIR SPACE
205F MEDIUM MATHEMATICAL SPACE
Historic Word Separators
16EB RUNIC SINGLE DOT PUNCTUATION
16EC RUNIC MULTIPLE DOT PUNCTUATION
16ED RUNIC CROSS PUNCTUATION
2056 THREE DOT PUNCTUATION
2058 FOUR DOT PUNCTUATION
2059 FIVE DOT PUNCTUATION
205A TWO DOT PUNCTUATION
205B FOUR DOT MARK
205D TRICOLON
205E VERTICAL FOUR DOTS
10100 AEGEAN WORD SEPARATOR LINE
10101 AEGEAN WORD SEPARATOR DOT
10102 AEGEAN CHECK MARK
1039F UGARITIC WORD DIVIDER
103D0 OLD PERSIAN WORD DIVIDER
1091F PHOENICIAN WORD DIVIDER
12470 CUNEIFORM PUNCTUATION SIGN OLD ASSYRIAN WORD DIVIDER
Dandas
0964 DEVANAGARI DANDA
0965 DEVANAGARI DOUBLE DANDA
0E5A THAI CHARACTER ANGKHANKHU
0E5B THAI CHARACTER KHOMUT
104A MYANMAR SIGN LITTLE SECTION
104B MYANMAR SIGN SECTION
1735 PHILIPPINE SINGLE PUNCTUATION
1736 PHILIPPINE DOUBLE PUNCTUATION
17D4 KHMER SIGN KHAN
17D5 KHMER SIGN BARIYOOSAN
1B5E BALINESE CARIK SIKI
1B5F BALINESE CARIK PAREREN
A8CE SAURASHTRA DANDA
A8CF SAURASHTRA DOUBLE DANDA
10A56 KHAROSHTHI PUNCTUATION DANDA
10A57 KHAROSHTHI PUNCTUATION DOUBLE DANDA
Tibetan
0F34 TIBETAN MARK BSDUS RTAGS
0F7F TIBETAN SIGN RNAM BCAD
0F85 TIBETAN MARK PALUTA
0FBE TIBETAN KU RU KHA
0FBF TIBETAN KU RU KHA BZHI MIG CAN
0FD2 TIBETAN MARK NYIS TSHEG
Other Terminating Punctuation
1804 MONGOLIAN COLON
1805 MONGOLIAN FOUR DOTS
1808 MONGOLIAN MANCHU COMMA
1809 MONGOLIAN MANCHU FULL STOP
1B5A BALINESE PANTI
1B5B BALINESE PAMADA
1B5C BALINESE WINDU
1B5D BALINESE CARIK PAMUNGKAH
1B60 BALINESE PAMENENG
1C3B LEPCHA PUNCTUATION TA-ROL
1C3C LEPCHA PUNCTUATION NYET THYOOM TA-ROL
1C3D LEPCHA PUNCTUATION CER-WA
1C3E LEPCHA PUNCTUATION TSHOOK CER-WA
1C3F LEPCHA PUNCTUATION TSHOOK
1C7E OL CHIKI PUNCTUATION MUCAAD
1C7F OL CHIKI PUNCTUATION DOUBLE MUCAAD
2CFA COPTIC OLD NUBIAN DIRECT QUESTION MARK
2CFB COPTIC OLD NUBIAN INDIRECT QUESTION MARK
2CFC COPTIC OLD NUBIAN VERSE DIVIDER
2CFF COPTIC MORPHOLOGICAL DIVIDER
2E0E..2E15 EDITORIAL CORONIS..UPWARDS ANCORA
2E17 OBLIQUE DOUBLE HYPHEN
A60D VAI COMMA
A60F VAI QUESTION MARK
A92E KAYAH LI SIGN CWI
A92F KAYAH LI SIGN SHYA
10A50 KHAROSHTHI PUNCTUATION DOT
10A51 KHAROSHTHI PUNCTUATION SMALL CIRCLE
10A52 KHAROSHTHI PUNCTUATION CIRCLE
10A53 KHAROSHTHI PUNCTUATION CRESCENT BAR
10A54 KHAROSHTHI PUNCTUATION MANGALAM
10A55 KHAROSHTHI PUNCTUATION LOTUS
Tibetan and Phags-Pa Head Letters
0F01 TIBETAN MARK GTER YIG MGO TRUNCATED A
0F02 TIBETAN MARK GTER YIG MGO -UM RNAM BCAD MA
0F03 TIBETAN MARK GTER YIG MGO -UM GTER TSHEG MA
0F04 TIBETAN MARK INITIAL YIG MGO MDUN MA
0F06 TIBETAN MARK CARET YIG MGO PHUR SHAD MA
0F07 TIBETAN MARK YIG MGO TSHEG SHAD MA
0F09 TIBETAN MARK BSKUR YIG MGO
0F0A TIBETAN MARK BKA- SHOG YIG MGO
0FD0 TIBETAN MARK BSKA- SHOG GI MGO RGYAN
0FD1 TIBETAN MARK MNYAM YIG GI MGO RGYAN
0FD3 TIBETAN MARK INITIAL BRDA RNYING YIG MGO MDUN MA
A874 PHAGS-PA SINGLE HEAD MARK
A875 PHAGS-PA DOUBLE HEAD MARK
CL: Closing Punctuation (XB)
3001..3002IDEOGRAPHIC COMMA..IDEOGRAPHIC FULL STOP
FE11 PRESENTATION FORM FOR VERTICAL IDEOGRAPHIC COMMA
FE12 PRESENTATION FORM FOR VERTICAL IDEOGRAPHIC FULL STOP
FE50 SMALL COMMA
FE52 SMALL FULL STOP
FF0C FULLWIDTH COMMA
FF0E FULLWIDTH FULL STOP
FF61 HALFWIDTH IDEOGRAPHIC FULL STOP
FF64 HALFWIDTH IDEOGRAPHIC COMMA
EX: Exclamation/Interrogation (XB)
0021 EXCLAMATION MARK
003F QUESTION MARK
05C6 HEBREW PUNCTUATION NUN HAFUKHA
061B ARABIC SEMICOLON
061E ARABIC TRIPLE DOT PUNCTUATION MARK
061F ARABIC QUESTION MARK
06D4 ARABIC FULL STOP
07F9 NKO EXCLAMATION MARK
0F0D TIBETAN MARK SHAD
0F0E TIBETAN MARK NYIS SHAD
0F0F TIBETAN MARK TSHEG SHAD
0F10 TIBETAN MARK NYIS TSHEG SHAD
0F11 TIBETAN MARK RIN CHEN SPUNGS SHAD
0F14 TIBETAN MARK GTER TSHEG
1802 MONGOLIAN COMMA [was BA]
1803 MONGOLIAN FULL STOP [was BA]
1808 MONGOLIAN MANCHU COMMA [was BA]
1809 MONGOLIAN MANCHU FULL STOP [was BA]
1944 LIMBU EXCLAMATION MARK
1945 LIMBU QUESTION MARK
2762 HEAVY EXCLAMATION MARK ORNAMENT
2763 HEAVY HEART EXCLAMATION MARK ORNAMENT
2CF9 COPTIC OLD NUBIAN FULL STOP [was BA]
2CFE COPTIC FULL STOP [was BA]
A60C VAI SYLLABLE LENGTHENER
A60E VAI FULL STOP
A876 PHAGS-PA MARK SHAD
A877 PHAGS-PA MARK DOUBLE SHAD
FE15 PRESENTATION FORM FOR VERTICAL EXCLAMATION MARK
FE16 PRESENTATION FORM FOR VERTICAL QUESTION MARK
FE56..FE57 SMALL QUESTION MARK..SMALL EXCLAMATION MARK
FF01 FULLWIDTH EXCLAMATION MARK
FF1F FULLWIDTH QUESTION MARK
IS: Numeric Separator (Infix) (XB)
002C COMMA
002E FULL STOP
003A COLON
003B SEMICOLON
037E GREEK QUESTION MARK (canonically equivalent to 003B)
0589 ARMENIAN FULL STOP
060C ARABIC COMMA [moved from EX]
060D ARABIC DATE SEPARATOR
07F8 NKO COMMA
2044 FRACTION SLASH
FE10 PRESENTATION FORM FOR VERTICAL COMMA
FE13 PRESENTATION FORM FOR VERTICAL COLON
FE14 PRESENTATION FORM FOR VERTICAL SEMICOLON
NS: Nonstarters (XB)
17D6 KHMER SIGN CAMNUC PII KUUH
203C DOUBLE EXCLAMATION MARK
203D INTERROBANG
2047 DOUBLE QUESTION MARK
2048 QUESTION EXCLAMATION MARK
2049 EXCLAMATION QUESTION MARK
3005 IDEOGRAPHIC ITERATION MARK
301C WAVE DASH
303C MASU MARK
303B VERTICAL IDEOGRAPHIC ITERATION MARK
309B.. 309E KATAKANA-HIRAGANA VOICED SOUND MARK..HIRAGANA VOICED ITERATION
MARK
30A0 KATAKANA-HIRAGANA DOUBLE HYPHEN
30FB..30FE KATAKANA MIDDLE DOT..KATAKANA VOICED ITERATION MARK
A015 YI SYLLABLE WU (misnomer for YI SYLLABLE ITERATION MARK)
FE54..FE55 SMALL SEMICOLON..SMALL COLON
FF1A..FF1B FULLWIDTH COLON.. FULLWIDTH SEMICOLON
FF65 HALFWIDTH KATAKANA MIDDLE DOT
FF70 HALFWIDTH KATAKANA-HIRAGANA PROLONGED SOUND MARK
FF9E..FF9F HALFWIDTH KATAKANA VOICED SOUND MARK..HALFWIDTH KATAKANA
SEMI-VOICED SOUND MARK
PO: Postfix (Numeric) (XB)
0025 PERCENT SIGN
00A2 CENT SIGN
00B0 DEGREE SIGN
060B AFGHANI SIGN
066A ARABIC PERCENT SIGN [moved from EX]
2030 PER MILLE SIGN
2031 PER TEN THOUSAND SIGN
2032..2037 PRIME..REVERSED TRIPLE PRIME
20A7 PESETA SIGN
2103 DEGREE CELSIUS
2109 DEGREE FAHRENHEIT
FDFC RIAL SIGN
FE6A SMALL PERCENT SIGN
FF05 FULLWIDTH PERCENT SIGN
FFE0 FULLWIDTH CENT SIGN
PR: Prefix (Numeric) (XA)
002B PLUS SIGN
005C REVERSE SOLIDUS
00B1 PLUS-MINUS
2116 NUMERO SIGN
2212 MINUS SIGN
2213 MINUS-OR-PLUS-SIGN
QU: Ambiguous Quotation (XB/XA)
0022 QUOTATION MARK
0027 APOSTROPHE
275B HEAVY SINGLE TURNED COMMA QUOTATION MARK ORNAMENT
275C HEAVY SINGLE COMMA QUOTATION MARK ORNAMENT
275D HEAVY DOUBLE TURNED COMMA QUOTATION MARK ORNAMENT
275E HEAVY DOUBLE COMMA QUOTATION MARK ORNAMENT