From: Ernest Cline (ernestcline@mindspring.com)
Date: Tue Apr 13 2004 - 21:57:52 EDT
I realize that Terminal_Punctuation is only an informative property,
but I have a question concerning it and characters that the Line
Breaking Algorithm identifies as being word dividers.
In UAX #14 the following info is given in the list of characters of Line
Break class BA:
Other forms of visible word dividers that provide break opportunities.
0F0B TIBETAN MARK INTERSYLLABIC TSHEG
1361 ETHIOPIC WORDSPACE
17D5 KHMER SIGN BARIYOOSAN
10100 AEGEAN WORD SEPARATOR LINE
10101 AEGEAN WORD SEPARATOR DOT
10102 AEGEAN CHECK MARK
1039F UGARITIC WORD DIVIDER
Of these seven characters, only two, U+1361 and U+17D5 have
the Terminal_Punctuation property. One of these, U+10102 is
a symbol and thus is not punctuation, but what is the distinction
that causes the other four to not also have the Terminal_Punctuation
property? Is it because Terminal_Punctuation is informative
that these other four have slipped thru the cracks, or is there
a reason I should be noticing, but am not?
This archive was generated by hypermail 2.1.5 : Tue Apr 13 2004 - 22:52:58 EDT