Re: [cldr-dev] Re: Questions on Chinese collation, stroke

From: Mark Davis ☕ <mark_at_macchiato.com>
Date: Fri, 8 Jun 2012 15:44:04 -0700

It can supply the data for both, if they differ. That's done with two
fields.

However, in this case there is only one value; if that's incorrect for this
character someone should file feedback.

------------------------------
Mark <https://plus.google.com/114199149796022210033>
*
*
*— Il meglio è l’inimico del bene —*
**

On Fri, Jun 8, 2012 at 2:41 PM, Claire Ho (賀靜蘭) <claireho_at_google.com> wrote:

> Check the tr38 <http://www.unicode.org/reports/tr38/tr38-12.html>, from
> the description of kTotalStrokes, it provides stroke count data for
> simplified Chinese and traditional Chinese.
> Then, I don't have concern.
>
> Thanks!
> Claire.
>
>
> On Fri, Jun 8, 2012 at 2:33 PM, Claire Ho (賀靜蘭) <claireho_at_google.com>wrote:
>
>> Hi Mark
>>
>> > There you find the line:
>>
>> > U+8303 kTotalStrokes 8
>>
>> In Traditional Chinese, U+8303 has 9 strokes as Matt mentioned in the
>> email.
>>
>> The radical "++" is counted as 4 strokes. I think there are several
>> radicals have the same issue, different stroke counts, between simplified
>> Chinese and traditional Chinese.
>>
>> Claire.
>>
>> On Thu, Jun 7, 2012 at 5:54 PM, Mark Davis ☕ <mark_at_macchiato.com> wrote:
>>
>>> On Thu, Jun 7, 2012 at 4:28 PM, Matt Ma <matt.ma.umail_at_gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have two questions regarding the collation sequence defined in
>>>> zh.xml, CLDR 21.0
>>>>
>>>> 1. Why is U+8303 (范) counted as 9 strokes instead of 8 for <collation
>>>> type="stroke">? As a reference, U+59DA (姚) is counted as 9 strokes but
>>>> sorted before U+8303 (范).
>>>>
>>>
>>> CLDR now gets the stroke collation data from the kTotalStokes property.
>>> The values for that are in the file Unihan/Unihan_DictionaryLikeData.txt in
>>> the Unicode Character Database.
>>>
>>> There you find the line:
>>>
>>> U+8303 kTotalStrokes 8
>>>
>>> If that is in error, or if there is any other error in the kTotalStrokes
>>> data, then please report the correct value according to
>>> http://www.unicode.org/review/pri230/ so that it can be fixed.
>>> As a related matter, CLDR now gets the pinyin collation data from
>>> the kMandarin property. The values for that are in the
>>> file Unihan/Unihan_Readings.txt in the Unicode Character Database. So if
>>> any of those are in error, they should also be reported as per
>>> http://www.unicode.org/review/pri230/ .
>>>
>>> The beta data is in ftp://www.unicode.org/Public/6.2.0/ucd/. Currently
>>> in ftp://www.unicode.org/Public/6.2.0/ucd/Unihan-6.2.0d1.zip
>>> but as the beta proceeds, the d1 might change to d2,d3...
>>>
>>>
>>>>
>>>> 2. Does the collation type, stroke, apply to both Simplified and
>>>> Traditional Chinese, as I do not see anything defined in zh_Hant.xml
>>>> under "stroke"?
>>>>
>>>
>>> Let me look at that.
>>>
>>>
>>>>
>>>> Thanks,
>>>> Matt
>>>>
>>>>
>>>>
>>>
>>
>
Received on Fri Jun 08 2012 - 17:48:01 CDT

This archive was generated by hypermail 2.2.0 : Fri Jun 08 2012 - 17:48:02 CDT