[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #10168(accepted data)

Opened 3 months ago

Last modified 2 months ago

Replace 닉닌닐닖님 with 닉닌닐닒님 in common/collation/ko.xml

Reported by: jaemin_chung@… Owned by: markus
Component: collation Data Locale: ko
Phase: rc Review:
Weeks: Data Xpath: http://unicode.org/repos/cldr/trunk/common/collation/ko.xml
Xref:

Description

http://unicode.org/repos/cldr/trunk/common/collation/ko.xml

The intent of the hangul regex right after "[reorder Hang Hani] [optimize" seems to be to cover the 2350 modern hangul syllables in KS X 1001. However, 닖 is not in KS X 1001; instead, 닒, which is in KS X 1001, is missing. 닖 should be replaced with 닒.

Attachments

Change History

comment:1 Changed 3 months ago by jungshik

  • Cc jungshik, markus, pedberg added

comment:2 Changed 3 months ago by jungshik

I wonder what kind of optimization can be done by listing 2350 precomposed syllables separately.

comment:3 Changed 3 months ago by jaemin_chung@…

Well, I am just reporting that there is an error in the regex. That is the purpose of this report. (I was not necessarily thinking about the purpose of that regex.)

comment:4 Changed 3 months ago by lunde@…

If the only difference between what is specified in the regex and the 2,350 modern hangul syllables that are specified in KS X 1001, then what Jaemin is pointing out is an error, pure and simple. Whoever created the regex obviously misidentified the lower-right component of U+B2D2 닒, and inadvertently entered U+B2D6 닖 instead. In terms of the purpose of these characters, any barely-functional Korean font would include at least glyphs, and represents a good measure for detecting whether a font supports Korean to a sufficient degree, or whether the text is Korean.

comment:5 Changed 3 months ago by jaemin_chung@…

The one I pointed out above is the only difference.
(I found that error when I was making a regex for the 2350 KS X 1001 hangul syllables on my own. I noticed that the one I made and the one in CLDR are not exactly the same. When I compared those two to find out what the difference is, that was spotted right away.)

comment:6 Changed 3 months ago by emmons

  • Status changed from new to accepted
  • Priority changed from assess to medium
  • Phase changed from dsub to rc
  • Milestone changed from UNSCH to 32
  • Owner changed from anybody to markus
  • Type changed from charts to data

comment:7 Changed 3 months ago by jaemin_chung@…

I discovered something pretty interesting.
These Korean code pages incorrectly map 0xB4D3 to U+B2D6 닖:
10003 (MAC - Korean)
20949
51949 (EUC-Korean)
I guess this error is due to using one of the code pages with the incorrect mapping.

comment:8 Changed 3 months ago by jaemin_chung@…

This is also interesting.

http://unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/KSC/KSC5601.TXT

0x8897	0xB2D3	# HANGUL SYLLABLE NIEUN-I-RIEULPIEUP--<3/22/95>
0x8898	0xB2D4	# HANGUL SYLLABLE NIEUN-I-RIEULSIOS---<3/22/95>
0x8899	0xB2D5	# HANGUL SYLLABLE NIEUN-I-RIEULTHIEUTH<3/22/95>
0x889A	0xB2D6	# HANGUL SYLLABLE NIEUN-I-RIEULPHIEUPH<3/22/95>

0xB4D3	0xB2D2	# HANGUL SYLLABLE NIEUN-I-RIEULMIEUM-<3/22/95>

It seems that there was indeed a mistake in the very first version of the mapping table. This mistake was corrected in March 22, 1995, but I guess some code pages (such as the ones I listed above) were never fixed.

A mapping error surviving more than two decades. Very interesting.

comment:9 Changed 2 months ago by jungshik

Ken, this is a collation table. Without digging more, I have no clue why 2350 syllables are listed separately.

View

Add a comment

Modify Ticket

Action
as accepted
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.