This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.
Date/Time: Mon Oct 24 04:54:54 CDT 2016
Name: SWARAN LATA
Report Type: Other Question, Problem, or Feedback
Opt Subject: Indic Line breaking rules for #335 - Proposed Update UAX #14, Unicode Line Breaking Algorithm
In Indian languages writing system , it is preferred that line breaks at word boundaries ,if required following principle may be adhered : New line cannot begin with following symbols/Punctuation marks. Also these should be retain with the associated text : Symbols Character name Unicode code-point । DEVANAGARI DANDA U + 0964 ॥ DEVANAGARI DOUBLE DANDA U + 0965 ) RIGHT PARENTHESIS U + 0029 + PLUS SIGN U + 002B * ASTERISK U + 002A - HYPHENATIONPOINT - VISIBLE HYPHEN HYPHENATION U + 2027 - SOFT HYPHEN U+ 00AD / SOLIDUS U + 002F , COMMA U + 002C . FULL STOP U + 002E : COLON U + 003A ; SEMICOLON U + 003B = EQUALS SIGN U + 003D > GREATER-THAN SIGN U + 003E ] RIGHT SQUARE BRACKET U + 005D _ LOW LINE U + 005F | VERTICAL LINE U + 007C } RIGHT CURLY BRACKET U + 007D ~ TILDE U + 007E % PERCENT SIGN U + 0025 Hyphenation at line boundary in Indian languages • Hyphen should be used at the breaking point so that word can be read intuitively. • However the language specific morpho-phonemic rules and industry practices (from media, publishing and grammar books) could be used for hyphenation. U+ 00AD (soft hyphen) is used in some languages such as Tamil and Malayalam. • The hyphenated words can be broken at the hyphenation point (U + 2027) e.g.: नर-नारी should be treated as: नर- on the first line and नारी on the next line
Feedback above this line was reviewed during UTC #149.
Date/Time: Thu Mar 9 10:27:25 CST 2017
Name: Elmar Braun
Report Type: Public Review Issue
Opt Subject: Public Review Issue #335 / UAX #14 U+FF70
UAX #14 revisions 37 and proposed 38 list U+FF70 twice, in the table of characters in class CJ, and also in the table of characters in class NS. According to the Unicode data 9.0, U+FF70 is in line breaking class CJ. Therefore I believe its listing under class NS to be in error.
Date/Time: Sun Apr 2 18:39:14 CDT 2017
Name: Rainer Perske
Report Type: Error Report
Opt Subject: Errors in LineBreakTest.txt
Dear Sir or Madam I have found three wrong entries in http://www.unicode.org/Public/9.0.0/ucd/auxiliary/LineBreakTest.txt Wrong entries are (without comment): × 200D × 0308 ÷ 231A ÷ × 200D × 0308 ÷ 261D ÷ × 200D × 0308 ÷ 1F3FB ÷ These entries violate the non-tailorable line breaking rules as indicated below. These lines represent characters with line break property: ZWJ CM ID ZWJ CM EB ZWJ CM EM LB9 says: Do not break a combining character sequence; treat it as if it has the line breaking class of the base character in all of the following rules. Treat ZWJ as if it were CM. The rule is: Treat X (CM | ZWJ)* as if it were X. Hence these lines are to be treated as: ZWJ ID ZWJ EB ZWJ EM LB8a says: Do not break between a zero width joiner and an ideograph, emoji base or emoji modifier. The rule is: ZWJ × (ID | EB | EM) Hence the lines should read: × 200D × 0308 × 231A ÷ × 200D × 0308 × 261D ÷ × 200D × 0308 × 1F3FB ÷ Kind regards Rainer Perske
Date/Time: Sat Apr 29 22:48:11 CDT 2017
Name: David Corbett
Report Type: Public Review Issue
Opt Subject: PRI #335: More double diacritics
Just as U+035C..0362 have line break class GL, so should the other double diacritics U+1DCD and U+1DFC. Similarly, it may make sense for the left and conjoining half marks U+FE20, U+FE22, U+FE24, U+FE26..FE27, U+FE29, U+FE2B, and U+FE2D..FE2E to have line break class GL.
Date/Time: Sat Apr 29 23:16:48 CDT 2017
Name: David Corbett
Report Type: Public Review Issue
Opt Subject: PRI #335: Breaking between Hebrew and hyphens
The annex says that U+00AD SOFT HYPHEN “is an invisible format character with no width. It marks the place where an optional line break may occur inside a word. It can be used with all scripts.” However, because of LB21a, it does not work with the Hebrew script. U+05BE HEBREW PUNCTUATION MAQAF, the Hebrew hyphen, has line break class BA. However, LB21a prevents a line break between HL and BA, making the maqaf essentially a non-breaking hyphen. If this is intentional, it should have line break class GL, to make the intent clear; if not, LB21a needs refinement, or even deletion. It would be nice if the annex explained the rationale for LB21a. See also L2/13-083.
Date/Time: Thu May 4 18:29:06 CDT 2017
Name: David Corbett
Report Type: Public Review Issue
Opt Subject: PRI #335: Syllable-based line breaks
The Unicode Standard explains how to break lines in Batak, Cham, Javanese, and Vai based on orthographic syllables. I suggest formally encoding these four scripts’ rules in UAX #14. Here are some specific suggestions. Vai letters should all be ID, except U+A60B..A60C, which should be CM. Cham letters U+AA00..AA24 and U+AA26..AA28 should be ID. Cham final consonants U+AA40..AA42 and U+AA44..AA4B should be CM. U+AA25 is a tricky one. Batak and Javanese letters should be ID. Batak killers should have a new class such that [ID × ID Batak_Killer] and [ID × Batak_Killer]. U+A9C0 JAVANESE PANGKON should be GL, or new class that is like GL but only glues lb=ID characters. I don’t know how to break around U+A9CF JAVANESE PANGRANGKEP but it might need a new class too. There are some ambiguities in the rules (e.g. U+AA25 can be syllable-final or -initial) but for the most part, they work. They are better than the current situation, where most implementers don’t know about these scripts, so they follow the default rules and miss many line break opportunities. If the UTC doesn’t like this idea, I suggest copying the rules from the core spec to a non-normative section of UAX #14. That would at least give these rules more visibility. Someone looking to implement a Unicode line-breaker is more likely to read the line-breaking spec than the entire core spec.