Comments on Public Review Issues (May 13, 2006

L2/06-378

Comments on Public Review Issues
(August 3, 2006 - November 7 ,2006)

The sections below contain comments received on the open Public Review Issues as of November 7, 2006, since the previous cumulative document was issued prior to UTC #108 (August 2006).

Contents:

75 Proposed Update UTR #25, Unicode Support for Mathematics
95 Stable Normalization Process

Other Reports

75 Proposed Update UTR #25, Unicode Support for Mathematics

No feedback was received via the reporting form this period.

95 Stable Normalization Process

Date/Time: Thu Aug 24 09:47:33 CDT 2006
Contact: <bqw10602@nifty.com>
Name: SADAHIRO Tomoyuki
Report Type: Public Review Issue
Opt Subject: Stable Normalization Process: Hangul L,V,C,T

I think the following case does not include in Table 11. Problem Sequences, but may be considered so.

----
First: 1100..1112 HANGUL CHOSEONG KIYEOK..HIEUH
Second:1161..1175 HANGUL JUNGSEONG A..I
Intervening Character(s)
Last: 11A8..11C2 HANGUL JONGSEONG KIYEOK..HIEUH
----

An example: <1100,1161,0300,11A8>

The first composition from <1100,1161,0300,11A8> produces <AC00,0300,11A8>, that is one of problem sequences.

Thank you.

Other Reports

Date/Time: Sat Aug 19 16:11:26 CDT 2006
Contact: typhlosion@gmail.com
Name: Benjamin Scarborough
Opt Subject: U+00B7 MIDDLE DOT in identifiers

The character U+00B7 MIDDLE DOT has the XID_Continue property, but not the ID_Continue property. It is the only character of this sort.

It should be given the Other_ID_Continue property (and thus the ID_Continue property through derivation) in order to make all XID_Continue characters a subset of ID_Continue characters and ensure that all valid identifiers defined with XID_Start and XID_Continue are also valid identifiers under ID_Start and ID_Continue.

Date/Time: Sat Aug 19 18:00:06 CDT 2006
Contact: dbvic@mac.com
Name: Didier BARBAS
Opt Subject: Georgian Case Folding

Both UnicodeData.txt and CaseFolding.txt have wrong case folding data:

UnicodeData.txt:
10A0;GEORGIAN CAPITAL LETTER AN;Lu;0;L;;;;;N;;Khutsuri;;2D00;
10A1;GEORGIAN CAPITAL LETTER BAN;Lu;0;L;;;;;N;;Khutsuri;;2D01;
etc...

CaseFolding.txt
10A0; C; 2D00; # GEORGIAN CAPITAL LETTER AN
10A1; C; 2D01; # GEORGIAN CAPITAL LETTER BAN
etc...

Lower case letter is not 2D00 onwards, but 10D0 onwards [Capital's code point + 0x30].

Regards,

Date/Time: Thu Aug 24 08:58:30 CDT 2006
Contact: <bqw10602@nifty.com>
Name: SADAHIRO Tomoyuki
Opt Subject: Corrigendum 5 Sequences
I found U+0DD9 is a character which is composable with another starter (whose combining class is 0), as well as composable with a combining character (whose combining class is non-zero).

<0DDA,0DCF> is NFC of <0DD9,0DCA,0DCF>.
<0DDA,0DDF> is NFC of <0DD9,0DCA,0DDF>.
Where
   U+0DCA is SINHALA SIGN AL-LAKUNA,
   U+0DCF is SINHALA VOWEL SIGN AELA-PILLA,
   U+0DD9 is SINHALA VOWEL SIGN KOMBUVA,
   U+0DDA is SINHALA VOWEL SIGN DIGA KOMBUVA,
   U+0DDF is SINHALA VOWEL SIGN GAYANUKITTA,
   CC of U+0DD9 is 0,
   CC of U+0DCA is 9,
   CC of U+0DCF is 0,
   CC of U+0DDF is 0.
Then U+0DD9 is composable with U+0DCA to U+0DDA, and there is no possibility of the composition with U+0DCF or U+0DDF, respectively.

In these cases, should U+0DCA be considered as the intervening character?

Thank you.
Date/Time: Fri Sep 1 08:09:06 CDT 2006
Contact: kent.karlsson14@comhem.se
Name: Kent Karlsson
Opt Subject: Numeric value for F9B2

F9B2 is canonically equivalent to 96F6.

96F6 has a data line in DerivedNumericValues.txt:
96F6 ; 0.0 # Lo CJK UNIFIED IDEOGRAPH-96F6

However, F9B2 does not have a corresponding line in that file, nor is F9B2 given a numeric value of 0 in UnicodeData.txt:
F9B2;CJK COMPATIBILITY IDEOGRAPH-F9B2;Lo;0;L;96F6;;;;N;;;;;

Date/Time: Fri Sep 1 08:30:12 CDT 2006
Contact: kent.karlsson14@comhem.se
Name: Kent Karlsson
Opt Subject: Numeric values for some Hangul syllables
I have gotten information that certain Hangul syllables have numeric values, and are used in numeric expressions of for instance dates.

But DerivedNumericValues.txt do not list these Hangul syllables (or their canonical decompositions) as having numeric values.

The numeric values are as follows:
C601 영 : 0,
C77C 일 : 1,
C774 이 : 2,
C0BC 삼 : 3,
C0AC 사 : 4,
C624 오 : 5,
C721 육 : 6,
CE60 칠 : 7,
D314 팔 : 8,
AD6C 구 : 9,
C2ED 십 : 10,
BC31 백 : 100,
CC9C 천 : 1000.
(Maybe there are more, my source stops at 1000.)

I guess this comes very close to number spellout, but so do the Chinese ideographs given numberic values too.
Date/Time: Fri Sep 1 08:45:36 CDT 2006
Contact: kent.karlsson14@comhem.se
Name: Kent Karlsson
Opt Subject: Hangul numerals
continued:

http://en.wikipedia.org/wiki/Korean_numerals

lists more monosyllabic numeral components (value, hanja, Hangul):
10^4 萬 만
10^8 億 억
10^12 兆 조
10^16 京 경
10^20 垓 해
(rare)
10^24 秭 자
10^28 穰 양
10^32 溝 구
10^36 澗 간
10^40 正 정
10^44 載 재
10^48 極 극
(and even more multisyllabic ones)
Date/Time: Sun Sep 10 07:29:54 CDT 2006
Contact: steffen@earthlingsoft.net
Name: Steffen Kamp
Report Type: Error Report
Opt Subject: Index.txt (Unicode 5.0) not UTF-8

The Unicode 5.0 Index.txt file is not valid UTF-8, I noticed the following errors:

character 0x92 in line 74 "abz[?]glich 2052"
character 0xe1 in line 854 "CNS[?]11643-1992, Duplicate Characters from 2F800"
character 0xe1 in line 1549 "Duplicate Characters from CNS[?]11643-1992 2F800"

I also was not able to determine a different encoding where these character codes would be valid and map to sensible characters.

Date/Time: Mon Sep 11 00:16:35 CDT 2006
Contact: weesan@cs.ucr.edu
Name: WeeSan Lee
Report Type: Error Report
Opt Subject: An error in Unihan.txt

Hi,

There is an unrecognized Pinyin character in Unihan.txt listed as below:

U+347C kMandarin LÃ<9C>E4

Thanks, -WeeSan

PS: Great job you guys are doing there!

Date/Time: Sun Sep 24 07:43:24 CST 2006
Contact: Theo.Veenker@let.uu.nl

UCD.html (under "Changes in specific files"):
- Bullets below WordBreakProperty.txt misplaced (wrong indent).

Unihan.html:
- Category of kCheungBauerIndex is defined as "Dictionary-like
Data" while under "Unihan Properties by Category" it is listed
under "Dictionary Indices".
- Category of kHangul is defined as "Dictionary Indices" while
under "Unihan Properties by Category" it is listed under
"Dictionary-like Data".

StandardizedVariants.txt mixes \r\n and \n line endings.

Date/Time: Sun Sep 24 10:30:03 CST 2006
Contact: charles@agenoria.fsnet.co.uk
Name: Charles Cox
Opt Subject: Ideographic Property

The JIS X 0213 compatibility additions to the CJK Compatibility Ideographs block, code points FA30 to FA6A inclusive, have been omitted from the ranges of code points listed as having the Ideographic property in PropList-5.0.0.txt .

Date/Time: Sun Sep 24 11:02:40 CST 2006
Contact: charles@agenoria.fsnet.co.uk
Name: Charles Cox
Opt Subject: Default_Ignorable_Code_Point Property

There appears to be an inconsistency in the Standard regarding the Default_Ignorable_Code_Point property. In http://www.unicode.org/Public/5.0.0/ucd/UCD.html it is stated that this property is generated from "Other_Default_Ignorable_Code_Point + Cf + Cc + Cs + Noncharacters - White_Space - annotation characters". However, this does not generate all the code points specified in http://www.unicode.org/Public/5.0.0/ucd/DerivedCoreProperties.txt as having the Default_Ignorable_Code_Point property. The codepoints not generated have General_Category Mn and are 180B to 180D, FE00 to FE0F and E0100 to E01EF, which comprise the complete set of code points having the property Variation_Selector. Amending the statement in UCD.html to "Other_Default_Ignorable_Code_Point + Cf + Cc + Cs + Noncharacters + Variation_Selector - White_Space - annotation characters" would be one way of restoring consistency.

Date/Time: Mon Sep 25 17:31:21 CST 2006
Contact: dz@bitxtender.com
Name: David Zülke

Unicode Technical Standard #35: LDML. 5.10.2 before/after currency explanation: sample <afterCurrency> block should contain empty insertBetween element to comply with the explanation test that states a pattern with a leading currency symbol would not have a non-breaking-space inserted.

Cheers,

David

L2/06-378

Comments on Public Review Issues (August 3, 2006 - November 7 ,2006)

Contents:

75 Proposed Update UTR #25, Unicode Support for Mathematics

95 Stable Normalization Process

Other Reports

Comments on Public Review Issues
(August 3, 2006 - November 7 ,2006)