Public Review Issues

Accumulated Feedback on PRI #206

This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.

Date/Time: Sun Sep 25 13:04:10 CDT 2011
Contact: corporate@khwilliamson.com
Name: Karl Williamson
Report Type: Public Review Issue
Opt Subject: PropertyValueAliases.txt inconsistencies


PropetyValueAliases.txt is supposed to be constructed so that the
first field is the property name; the 2nd the value's abbreviated
name; the 3rd the long name; and any additional names are in trailing
fields.  Four entries in the file are anomalous, and I'm requesting
that these be changed to the same as the other file entries.  The
entries in question are:

dt ; Font      ; font
dt ; None      ; none
dt ; Sub       ; sub
dt ; Wide      ; wide

What each of these says is that the long name is the short name lower-
cased.  Thus the long name comes out as less formal than the short
name.  In no other cases are long names less formal than their short
names.  I believe that what was intended was something like other
entries for the dt property:

dt ; Can       ; Canonical                        ; can

I would like the 4 entries to be changed to:
dt ; Font      ; Font                             ; font
dt ; None      ; None                             ; none
dt ; Sub       ; Sub                              ; sub
dt ; Wide      ; Wide                             ; wide

to correspond with the other entries in dt.

Since Unicode property value matching rules call for case to be
ignored, I don't understand, anyway, why there is an extra alias for
these that is just the lower-cased short name.  Perhaps this extra
alias should be omitted.

Date/Time: Sun Sep 25 14:08:42 CDT 2011
Contact: corporate@khwilliamson.com
Name: Karl Williamson
Report Type: Public Review Issue
Opt Subject: Inconsistent @missings in UCD .txt files


Some files use the abbreviated property value name for their data, and
some use the long one.  It would be nice if this were consistent, but
I suppose that it's too late to change these.  The most annoying is
that ScriptExtensions.txt uses the short name, while Scripts.txt uses
the long, and ScriptExtensions is not stand-alone, so code that reads
them both has to convert.  Since this is a provisional property, it
may not be too late to change this.  I ask for this to be considered.

But my major request is that the @missings lines in a file use the
same style (abbreviated or long) as the rest of the lines in the file
for a given property.  Unicode hasn't formally specified the format of
the lines in the UCD .txt files that give the default value for code
points not explicitly mentioned (at least last time I looked it
hadn't), yet these are specified as to be machine-readable.  So, I've
assumed that the format is stable, and programmed reading them based
on the existing paradigm, but I would think that it would be possible
to change to the other style in these lines.

For example, there is an annoying inconsistency in the
DerivedNormalizationProps.txt file. The value for code points that are
explicitly listed are based on the abbreviated property value alias,
'N' and 'M', but the missing defaults are listed as the long property
value:

# @missing: 0000..10FFFF; NFD_QC; Yes
# @missing: 0000..10FFFF; NFC_QC; Yes
# @missing: 0000..10FFFF; NFKD_QC; Yes
# @missing: 0000..10FFFF; NFKC_QC; Yes

I've had to program around this inconsistency, as has Asmus Freytag.
The problem is that I'm writing code to expose the UCD db to Perl
programs for the next Perl version.  I would rather they not have to
program around this inconsistency, as well.  And it seems like the
best place to fix it is at the source.  I am requesting that Unicode
change these lines to be:

# @missing: 0000..10FFFF; NFD_QC; Y
# @missing: 0000..10FFFF; NFC_QC; Y
# @missing: 0000..10FFFF; NFKD_QC; Y
# @missing: 0000..10FFFF; NFKC_QC; Y

There are other files where this is true as well, namely 
HangulSyllableType-6.1.0d11.txt:# @missing: 0000..10FFFF; Not_Applicable
extracted/DerivedBidiClass-6.1.0d12.txt:# @missing: 0000..10FFFF; Left_To_Right
extracted/DerivedCombiningClass-6.1.0d12.txt:# @missing: 0000..10FFFF; Not_Reordered
extracted/DerivedEastAsianWidth-6.1.0d12.txt:# @missing: 0000..10FFFF; Neutral
extracted/DerivedJoiningType-6.1.0d12.txt:# @missing: 0000..10FFFF; Non_Joining
extracted/DerivedLineBreak-6.1.0d12.txt:# @missing: 0000..10FFFF; Unknown

It would be nice if these were made consistent with the rest of the
data in their respective files

Date/Time: Tue Oct 4 00:24:51 CDT 2011
Contact: jamadagni@gmail.com
Name: Shriramana Sharma
Report Type: Public Review Issue
Opt Subject: Feedback on PRI 206 Unicode 6.1 beta


In the Tifinagh beta chart from http://www.unicode.org/Public/6.1.0/charts/blocks/U2D30.pdf 
the character 2D7F Tifinagh Consonant Joiner still has the annotation: "shape shown is 
arbitrary and is not visibly rendered".

This is not entirely true. The recent document L2/11-112, based on which the glyph of 
this character 2D7F has been changed from a boxed TFNCJ to six dots in a dotted box, 
specifically says that it is desired to display the six dots when a proper biconsonant 
glyph is not available.

It is hence recommended that the above annotation be changed to read:

"* shape shown is arbitrary;
 * the six dots are recommended for fallback use if a biconsonant glyph is not available"

or something like that.

Date/Time: Sat Oct 8 00:44:35 CDT 2011
Contact: petercon@microsoft.com
Name: Peter Constable
Report Type: Public Review Issue
Opt Subject: Beta review: Bidi category of 1F48C


In TUS6.0 and the beta, the bidi category of 1F48C is set to L. It should be ON, 
like all the other symbols in that block. Ken Whistler indicated that the case was a 
script used in drafting data files detecting "letter" in the character name, LOVE LETTER.

Date/Time: Sat Oct 8 00:46:10 CDT 2011
Contact: petercon@microsoft.com
Name: Peter Constable
Report Type: Public Review Issue
Opt Subject: Beta review: Bidi category of 1F48C


Addendum: provided by Ken Whistler:

BTW, when you report that one, there is another with the exact
same problem:

U+1F524 INPUT SYMBOL FOR LATIN *LETTER*S

which is also bc=L, instead of the expected bc=ON.

Cf.

U+1F520 INPUT SYMBOL FOR LATIN CAPITAL LETTERS

which *did* get corrected, and is the expected bc=ON.

Date/Time: Tue Oct 25 17:56:09 CDT 2011
Contact: markus.icu@gmail.com
Name: Markus Scherer
Report Type: Public Review Issue
Opt Subject: Unicode 6.1 DerivedBidiClass.txt bug


# DerivedBidiClass-6.1.0.txt
# Date: 2011-09-16, 21:06:13 GMT [MD]

has moved U+1EE00  - U+1EEFF from default-R to default-AL.
The problem is that in the comments U+1EEFF is listed as both AL and R. Please 
change the comments so that U+1EF00 is the first remaining default-R code point.

Change
#     [\u0590-\u05FF \u07C0-\u089F \uFB1D-\uFB4F \U00010800-\U00010FFF \U0001E800-\u0001EDFF \U0001EEFF-\U0001EFFF]
to
#     [\u0590-\u05FF \u07C0-\u089F \uFB1D-\uFB4F \U00010800-\U00010FFF \U0001E800-\u0001EDFF \U0001EF00-\U0001EFFF]

and change
#                        U+1EEFF - U+1EFFF
to
#                        U+1EF00 - U+1EFFF

The actual data looks correct.