Comments on Public Review Issues

L2/13-013

Comments on Public Review Issues
(October 31, 2012 - January 25, 2013)

The sections below contain links to permanent feedback documents for the open Public Review Issues as well as other public feedback as of July 24, 2012, since the previous cumulative document was issued prior to UTC #133 (November 2012). This document does not include feedback on moderated Public Review Issues from the forum that have been digested by the forum moderators; those are in separate documents for each of the PRIs. Gray items in the Table of Contents do not have feedback here.

Issue Name (+ feedback links)

228 Changing some common characters from Punctuation to Symbol

232 Proposed Update UAX #9, Unicode Bidirectional Algorithm

235 Proposed Update UTS #10, Unicode Collation Algorithm

236 Proposed Update UAX #11, East Asian Width (no feedback)

237 Proposed Update UAX #14, Unicode Line Breaking Algorithm

238 Proposed Update UAX #15, Unicode Normalization Forms (no feedback)

239 Proposed Update UAX #24, Unicode Script Property (no feedback)

240 Proposed Update UAX #29, Unicode Text Segmentation

241 Proposed Update UAX #31, Unicode Identifier and Pattern Syntax

243 Proposed Update UAX #38, Unicode Han Database (Unihan) (no feedback)

244 Proposed Update UAX #41, Common References for Unicode Standard Annexes (no feedback)

246 Proposed Update UAX #44, Unicode Character Database

247 Proposed Update UAX #45, U-Source Ideographs (no feedback)

The links below go to locations in this document for feedback.

Feedback on Encoding Proposals
Closed Public Review Issues
Error Reports
Other Reports

Error Reports

Date/Time: Sat Nov 24 00:06:26 CST 2012
Contact: [email protected]
Name: Masatoshi Kimura
Report Type: Error Report
Opt Subject: StandardizedVariants.txt contains forbidden variation sequences


According to TUS v6.1 clause 16.4,
http://www.unicode.org/versions/Unicode6.2.0/ch16.pdf#page=15
> > The base character in a variation sequence is never a 
> > combining character or a decomposable character.
However, the following base characters appearing in
http://unicode.org/Public/6.2.0/ucd/StandardizedVariants.txt
have a decomposition mapping.
203C => <compat> 0021 0021
2049 => <compat> 0021 003F
2139 => <font> 0069
24C2 => <circle> 004D
3297 => <circle> 795D
3299 => <circle> 79D8
1F21A => <square> 7121
1F22F => <square> 6307
Either clause 16.4 or StandardizedVariants.txt need to be updated to fix the inconsistency.

Ken Whistler responded on 2012/11/26:

We do have a textual problem here. This should be filed in the
Other Feedback section of the feedback document for the next UTC.

I think the fix is a one-word addition: "decomposable character" to
"canonical decomposable character" in the last paragraph on p. 556 in Section 16.4.

But we also should add text regarding the stability of variation
sequences across normalization forms, to make the implications clearer.
And in any case, this needs UTC review.

Maybe add my suggestions here as part of the feedback, so we don't have to
reconstruct the context from scratch at the meeting.

--Ken

Date/Time: Tue Dec 18 17:42:16 CST 2012
Contact: [email protected]
Name: asmus
Report Type: Error Report
Opt Subject: Name aliases in Namelist.txt


The normative name alias for FEFF (i.e. Byte Order Mark) is not 
printed with the same special symbol (reference mark) as the normative 
name aliases for other "misnomers". I believe this is due to the fact 
that (as the only such alias) it was classified as "alternate" rather 
than "correction".

I suggest a simple fix: to use % for alternate aliases when printing 
the nameslist.

This will affect precisely one character and would not affect any 
public data files except for the nameslist (which is not intended 
to be machine parseable).

Date/Time: Wed Jan 23 12:08:11 CST 2013
Contact: [email protected]
Name: Roger Costello
Report Type: Error Report
Opt Subject: Error in Unicode Technical Report #36


In the Unicode Technical Report #36, Unicode Security Considerations [1] it says:

    PEP 383 takes this approach. It enables lossless 
    conversion to Unicode by converting all "unmappable" 
    sequences to a sequence of one or more isolated 
    high surrogate code points. That is, each unmappable 
    byte's value is a code point whose value is 0xDC00 
    plus byte value.

Notice "high surrogate" in that quote. I'm confused. I thought the low 
surrogate range started at 0xDC00, but this document is saying that  
0xDC00 + byte value = high surrogate.  Is that a typo in the document?

[1] http://www.unicode.org/reports/tr36/#TOC-PEP-383-Approach

Other Reports

Date/Time: Wed Dec 19 17:53:07 CST 2012
Contact: [email protected]
Name: Markus Scherer
Report Type: Other Question, Problem, or Feedback
Opt Subject: noncharacters should not be treated like ill-formed text


I found that I cannot edit some of the CLDR files (CJK collation tailorings)
with the Gnome Linux default editor gedit because those files contain the
noncharacter U+FDD0 and gedit treats noncharacters like ill-formed byte
sequences.

Other editors don't seem to do this, but I am having trouble pointing to a
piece of the standard or the web site that clearly states that noncharacters
are "better than ill-formed sequences".

We have a number of statements saying noncharacters should not be used in open
interchange. In the standard, 16.7 Noncharacters even says "It is good
practice, however, to recognize it as a noncharacter and to take appropriate
action, such as replacing it with U+FFFD replacement character, to indicate
the problem in the text."

On the other hand, just a little further the standard says "In effect,
noncharacters can be thought of as application-internal private-use code
points." which is really how they are used in CLDR (for use in implementations
of alphabetic indexes).

Please add a statement to the effect that noncharacters are "better than ill-
formed sequences", and please remove the "good practice" to replace
noncharacters with U+FFFD.

Issue	Name (+ feedback links)
228	Changing some common characters from Punctuation to Symbol
232	Proposed Update UAX #9, Unicode Bidirectional Algorithm
235	Proposed Update UTS #10, Unicode Collation Algorithm
236	Proposed Update UAX #11, East Asian Width (no feedback)
237	Proposed Update UAX #14, Unicode Line Breaking Algorithm
238	Proposed Update UAX #15, Unicode Normalization Forms (no feedback)
239	Proposed Update UAX #24, Unicode Script Property (no feedback)
240	Proposed Update UAX #29, Unicode Text Segmentation
241	Proposed Update UAX #31, Unicode Identifier and Pattern Syntax
243	Proposed Update UAX #38, Unicode Han Database (Unihan) (no feedback)
244	Proposed Update UAX #41, Common References for Unicode Standard Annexes (no feedback)
246	Proposed Update UAX #44, Unicode Character Database
247	Proposed Update UAX #45, U-Source Ideographs (no feedback)

L2/13-013