L2/07-398
Date: October 18, 2007
Source: Mark Davis
Subject: Word/Sentence punctuation property recommendations
----------------------
Document 07-370 discusses certain property issues. This document presents a series of recommendations based on that document and subsequent discussion in the meeting.1. Remove
FE13
( ︓ ) PRESENTATION FORM FOR VERTICAL COLON from the property value Word_Break=MidNum. This is clearly an oversight; we explicitly remove COLON and just missed its compatibility equivalent.2. Allow certain characters to "bridge" both numeric and alphabetic words. That is, if these characters are between digits or between alphabetic characters, they continue numeric and alphabetic words.
2A. Add a property value Word_Break=MidNumLet, with the following characters (these will be removed from other property values):
2B. Make the following changes to the rules, to allow these characters to bridge both alphabetic and numeric words:0027
( ' ) APOSTROPHE
002E
( . ) FULL STOP
2018
( ' ) LEFT SINGLE QUOTATION MARK
2019
( ' ) RIGHT SINGLE QUOTATION MARK
2024
( ․ ) ONE DOT LEADER
FE52
( ﹒ ) SMALL FULL STOP
FF07
( ' ) FULLWIDTH APOSTROPHE
FF0E
( . ) FULLWIDTH FULL STOPIn the text of the document, call out the last four characters as an open issue, requesting public feedback. The text would look something like:
Open Issue: The following characters have been tentatively added to MidNumLet for Unicode 5.1. As of Unicode 5.0, there were already compatibility equivalents of characters in MinNum and MidLetter, but the lists were not complete. These characters add compatibility equivalents to those characters that "bridge" numeric and alphabetic words. The inclusion of these characters only has an effect if they are surrounded by either numbers or alphabetic letters. In particular, this change has no effect if these characters are adjacent to ideographs.
- Replace MidLetter by (MidLetter | MidNumLet)
- Replace MidNum by (MidNum | MidNumLet)
3. Add the following characters to Word_Break=MidLetter
0387
( · ) GREEK ANO TELEIA
FE13
( ︓ ) PRESENTATION FORM FOR VERTICAL COLON
FE55
( ﹕ ) SMALL COLON
FF1A
( : ) FULLWIDTH COLON
Add an Open Issue to the document for these characters, along the above lines. Include the information that while0387
( · ) GREEK ANO TELEIA or00B7
( · ) MIDDLE DOT (its compatibility equivalent) may be used as a semicolon in Greek, like COLON it is safe to allow either one within words, since in the use as semicolon there would not be a letter immediately following.
4. Add the following characters to Word_Break=MidNum
Add an Open Issue to the document for the last 4 characters, along the lines of the above.066C
( ٬ ) ARABIC THOUSANDS SEPARATOR
FE50
( ﹐ ) SMALL COMMA
FE54
( ﹔ ) SMALL SEMICOLON
FF0C
( , ) FULLWIDTH COMMA
FF1B
( ; ) FULLWIDTH SEMICOLON
5. Add the alternative forms of full stop to SentenceBreak=Aterm, ending up with the following.
Add an Open Issue as an open issue, as above.002E
( . ) FULL STOP
2024
( ․ ) ONE DOT LEADER
FE52
( ﹒ ) SMALL FULL STOP
FF0E
( . ) FULLWIDTH FULL STOP
6. Add an informative note that some or all of following characters may want to be tailored to be in MidNum.
0020
( ) SPACE
00A0
( ) NO-BREAK SPACE
2007
( ) FIGURE SPACE
2008
( ) PUNCTUATION SPACE
2009
( ) THIN SPACE
202F
( ) NARROW NO-BREAK SPACE