Test Data Gone?
Martin J. Dürst
duerst at it.aoyama.ac.jp
Mon Nov 16 04:00:48 CST 2015
On 2015/11/16 15:30, Mark Davis ☕️ wrote:
> Probably the most thorough test you could use would be one that tests
> semi-random strings to see if you get the same results as ICU.
Good idea. For tailorings, one thing to do is to extract the characters
used in the tailoring and to bias the semi-random strings heavily
towards using these characters.
Based on my experience with testing data for normalization (NFC and
friends), I can say that having a good set of test data is extremely
useful for implementers. I strongly encourage the Unicode Consortium to
curate such data, and implementers at all levels to contribute to it.
> On Nov 16, 2015 06:32, "Markus Scherer" <markus.icu at gmail.com> wrote:
>> On Sun, Nov 15, 2015 at 10:56 AM, Cameron Dutro <cameron at lumoslabs.com>
>>> Great, thanks Markus. Having these files is wonderful, and we're using
>>> them to test our implementation already. It is my understanding however
>>> that they do not test individual locale tailorings, is that correct?
>> The UCA test file is only for the DUCET, corresponding to what we call the
>> "root locale". Actually, since CLDR tailors the default sort order, and ICU
>> implements that, CLDR has modified versions of those test files:
>> The ICU test file has a number of test cases for various locales, as
>> indicated in the test data. They assume CLDR collation data. More often, I
>> tried to make minimal assumption about the collation data, and copied
>> relevant parts of rules into the test data -- so some of the test cases
>> require a from-rules builder. As a result, this file might be too specific
>> for other implementations.
>> CLDR-Users mailing list
>> CLDR-Users at unicode.org
> CLDR-Users mailing list
> CLDR-Users at unicode.org
More information about the CLDR-Users