L2/14-252

Comments on Public Review Issues
(July 29 - October 24, 2014)

The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of October 24,2014, since the previous cumulative document was issued prior to UTC #140 (August 2014). This document does not include feedback on moderated Public Review Issues from the forum that have been digested by the forum moderators; those are in separate documents for each of the PRIs. Grayed-out items in the Table of Contents do not have feedback here.

Contents:

The links below go directly to open PRIs and to feedback documents for them, as of October 24, 2014. Gray rows have no feedback to date.

IssueNameFeedback Link
285 Proposed Update UTS #10, Unicode Collation Algorithm (feedback)
284 Proposed Update UAX #44, Unicode Character Database (feedback)
283 Proposed Update UAX #14, Unicode Line Breaking Algorithm (feedback)
282 Proposed Update UAX #31, Unicode Identifier and Pattern Syntax (feedback)
281 Proposed encoding model change for New Tai Lue (feedback)
280 Proposed Update UTR #23, The Unicode Character Property Model (feedback)
279 Proposed Update UAX #9, Unicode Bidirectional Algorithm (feedback)
277 Reconciling Script and Script_Extensions Character Properties (feedback)

The links below go to locations in this document for feedback.

Feedback on Encoding Proposals
Feedback on UTRs / UAXes
Error Reports
Other Reports

 


Feedback on Encoding Proposals

Date/Time: Wed Aug 20 11:25:16 CDT 2014
Name: John Cowan
Report Type: Error Report
Opt Subject: L2/14-192: Preliminary Proposal to Encode the Turkestani Script

Given the repeated statements in the proposal that Tocharian and Khotanese
writing are mutually illegible, and the need for different treatment of
certain letters in the two writing systems, I believe that they should not be
unified.  The precedent is the severance of the various 22-character West
Semitic abjads, which are completely isomorphic (unlike this case), on the
grounds of mutual illegibility.

I therefore propose that the Khotanese characters be removed and the script
renamed to "Tocharian".

Date/Time: Wed Aug 20 11:18:45 CDT 2014
Name: John Cowan
Report Type: Feedback on an Encoding Proposal
Opt Subject: L2/14-200: Don't use U+E01EF VARIATION SELECTOR-256

I strongly support this; indeed, I think that U+E01EF should be formally
deprecated.  Nobody should be using it now, and nobody should use it in
future.

Date/Time: Wed Aug 20 11:17:09 CDT 2014
Name: John Cowan
Report Type: Feedback on an Encoding Proposal
Opt Subject: L2/14-202: Additional proposal to encode Latin characters for theta and delta

I have no problem with the proposed capital letters.

I am concerned that the introduction of a Latin small theta will disrupt the
standard encoding of IPA with the Greek small theta.  IPA is complicated
enough without introducing multiple spellings of a very common character.
Strong provisions would have to be made to ensure that IPA continued to be
encoded with Greek theta.

Date/Time: Sat Oct 25 03:06:55 CDT 2014
Name: Michael Bobeck
Report Type: Submission (FAQ, Tech Note, Case Study)
Opt Subject: 14255-yot.pdf

Proposal feedback:

http://www.unicode.org/L2/L2014/14255-yot.pdf

contains following footnote:

[The letters "IHS" and "JHS" are representations in Latin of the Greek: Latin
I (or J) for iota, Latin H for eta, and Latin S for sigma (lunate or final
form).]

This footnote treats Latin H IPA /h/ consonant and Greek H IPA /e:/ vowel as
one and the same sound. This alone badly conflates distinct letters
functioning differently in distinct alphabets. Correct Latin representation of
Greek IHS/JHS should be IES/JES, but it is not the case. Because of this, IHS
[iota-eta-stigma]/JHS [yot-eta-stigma] (there is no capital final sigma in
Unicode) is Greek, not Latin, since there is no Latin letter which looks like
H and sounds like E.

Thus we have several mutually exclusive solutions:

IHS is Latin and means IPA /i/ /h/ /s/ [irrelevant]
IHS is Greek and means IPA /i/ /e:/ /s/ [partially supportive for dotless yot]
JHS is Latin and means IPA /j/ /h/ /s/ [irrelevant]
JHS is Greek and means IPA /j/ /e:/ /s/ [wholly supportive for dotless yot]

As you see, /h/ cannot be /e:/ at once. These are bare facts, regardless what
is written by anyone. Distinction or its lack between letters is again
irrelevant, since letters say for themselves, regardless how they are
interpreted/conflated by their users, just like R and L are not distinguished
by Japanese, which does not change fact that R and L are still distinct
letters.

Feedback on UTRs / UAXes


Error Reports

Date/Time: Fri Sep 5 12:53:46 CDT 2014
Name: Ken Lunde
Report Type: Error Report
Opt Subject: U+00A5 YEN SIGN annotation in Code Charts

The annotation for U+00A5 YEN SIGN in the Code Charts includes the following
bulleted item: "glyph may have one or two crossbars." I researched this to its
conclusion, and I recommend that this annotation simply be removed on the
grounds that the correct/legal form, as used in China and Japan, includes two
crossbars.

The single crossbar form seems to have been introduced in ISO 1073-1:1976
(Alphanumeric character sets for optical recognition -- Part 1: Character set
OCR-A -- Shapes and dimensions of the printed image), which served as the
reference glyph for GB 2312-80 0x2324 (U+FFE5) and GB 1988-89 0x24 (U+00A5),
and this form was propagated to all subsequent GB and GB/T standards that
include these characters, up to and including GB 18030-2005, which includes
both characters in single crossbar form.

CHEN Zhuang (CESI) provided to me the following details by email on
2014-09-04:

"I met one GB 2312 developer. He told me that the one bar RMB symbol was
introduced to GB 2312 with reference of ISO 1073-1:1976 (Alphanumeric
character sets for optical recognition--Part 1:Character set OCR-A--Shapes
and dimensions of the printed image). The developers thought the one bar
symbol easier to be displayed in small matrix. But the the legal symbol is
still two bars one so far."

Other Reports

Date/Time: Sun Sep 14 15:08:34 CDT 2014
Name: Joseph
Report Type: Other Question, Problem, or Feedback
Opt Subject: RTL & LTR bidirectional problems

Hello,
Bidirectional text is badly handled by applications. I know only one application that 
handles it correctly. Punctuation and parentheses are source of much trouble.
I suggest to reserve one set of punctuation characters and parentheses to RTL languages.
A second solution is to render punctuation characters and parentheses bi-directional to 
follow the flow of text.
Thanks

Date/Time: Thu Sep 18 15:11:28 CDT 2014
Name: Roozbeh Pournader
Report Type: Error Report
Opt Subject: Behavior of music format character underspecified

The music format characters at U+1D173..1D17A are underspecified. For example, it 
is not clear if they can nest like the bidi formatting characters.

Making them nestable would make their implementation in plain text environments very 
hard, I suggest the standard clearly mentions that they are nestable, or that different 
kinds of format may nest, but not more than one of each, etc.

Date/Time: Mon Oct 20 09:14:24 CDT 2014
Name: Thomas Wolff
Report Type: Error Report
Opt Subject: Width Properties of Hangul Jamo ranges

I was recently confused by the following observation:
Unicode 5.2 introduced
A960... Hangul Jamo Extended-A
D7B0... Hangul Jamo Extended-B
both marked wide ("W") in EastAsianWidth.txt
However, Unicode 6.2 marked the second of these ranges "N" (while the first remains 
explicitly wide).

I discussed this with Rick McGowan and then Ken Lunde.
After Ken explained to me how the Hangul Jungseong and Jongseong characters are 
used, I noticed this as a related matter:

Markus Kuhn's character width script wcwidth.c (as used, e.g., by xterm) 
needs to explicitly add the original Hangul Jungseong/Jongseong range 
U+1160...U+11FF (and additionally U+00AD and U+200B) to the zero-width list as they 
are not marked as "combining" in Unicode data.
While that works for terminal control and can be extended for U+D7B0... 
I think:
Adding such special cases and exceptions in further Unicode versions 
makes it next to impossible for implementors of software running in 
text-mode terminals to stay up-to-date. It would be better to rely 
only on information that can be automatically extracted from Unicode 
data (as my editor MinEd does to a large extent).

___________________________________________________
In summary, the situation appears to me as follows:

JUNGSEONG and JONGSEONG characters are effectively, or technically, combining.
They should therefore be marked as combining characters "Mn", consistent 
with some other ranges that behave in a similar way, e.g.:
 U+0363;COMBINING LATIN SMALL LETTER A;Mn;230;NSM;;;;;N;;;;;
 U+0F90;TIBETAN SUBJOINED LETTER KA;Mn;0;NSM;;;;;N;;;;;
 U+18A9;MONGOLIAN LETTER ALI GALI DAGALGA;Mn;228;NSM;;;;;N;;;;;
 U+1932;LIMBU SMALL LETTER ANUSVARA;Mn;0;NSM;;;;;N;;;;;

Furthermore, Ken pointed out "my reading of Section 6.2 of UAX #11 
suggests that all combining jamo characters should have the East Asian 
Width property value of W". This confirms my point of view that 
combining marks should reflect the width property of the typical base 
characters they are used with, just like a few others:
 U+302A # IDEOGRAPHIC LEVEL TONE MARK
 U+302B # IDEOGRAPHIC RISING TONE MARK
 U+302C # IDEOGRAPHIC DEPARTING TONE MARK
 U+302D # IDEOGRAPHIC ENTERING TONE MARK
 U+3099 # COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK
 U+309A # COMBINING KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK


Considering all this, I don't think there is convincing reason to call 
the JUNGSEONG and JONGSEONG narrow and especially not combining;
I would suggest to fix the situation by changing them to:
• Combining (important, don't know which combining class)
• Wide (less important but more consistent)

Let me repeat that the combining property would be really important 
for the proper implementation of character-cell text mode applications.