Test Data Gone?

Mark Davis ☕️ mark at macchiato.com
Mon Nov 16 06:44:47 CST 2015

At the time we retracted it, it didn't appear that there was a lot of
usage, and you really get a much more thorough test by comparing to ICU's

The data we previously had was mechanically generated from the data, not
curated. It was created by generating concatenations of some chosen
primary/secondary/tertiary characters together with the tailored+exemplar
characters for each language.


On Mon, Nov 16, 2015 at 11:00 AM, Martin J. Dürst <duerst at it.aoyama.ac.jp>

> On 2015/11/16 15:30, Mark Davis ☕️ wrote:
>> Probably the most thorough test you could use would be one that tests
>> semi-random strings to see if you get the same results as ICU.
> Good idea. For tailorings, one thing to do is to extract the characters
> used in the tailoring and to bias the semi-random strings heavily towards
> using these characters.
> Based on my experience with testing data for normalization (NFC and
> friends), I can say that having a good set of test data is extremely useful
> for implementers. I strongly encourage the Unicode Consortium to curate
> such data, and implementers at all levels to contribute to it.
> Regards,   Martin.
> On Nov 16, 2015 06:32, "Markus Scherer" <markus.icu at gmail.com> wrote:
>> On Sun, Nov 15, 2015 at 10:56 AM, Cameron Dutro <cameron at lumoslabs.com>
>>> wrote:
>>> Great, thanks Markus. Having these files is wonderful, and we're using
>>>> them to test our implementation already. It is my understanding however
>>>> that they do not test individual locale tailorings, is that correct?
>>> The UCA test file is only for the DUCET, corresponding to what we call
>>> the
>>> "root locale". Actually, since CLDR tailors the default sort order, and
>>> ICU
>>> implements that, CLDR has modified versions of those test files:
>>> http://unicode.org/cldr/trac/browser/trunk/common/uca/
>>> The ICU test file has a number of test cases for various locales, as
>>> indicated in the test data. They assume CLDR collation data. More often,
>>> I
>>> tried to make minimal assumption about the collation data, and copied
>>> relevant parts of rules into the test data -- so some of the test cases
>>> require a from-rules builder. As a result, this file might be too
>>> specific
>>> for other implementations.
>>> markus
>>> _______________________________________________
>>> CLDR-Users mailing list
>>> CLDR-Users at unicode.org
>>> http://unicode.org/mailman/listinfo/cldr-users
>> _______________________________________________
>> CLDR-Users mailing list
>> CLDR-Users at unicode.org
>> http://unicode.org/mailman/listinfo/cldr-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20151116/677fa75c/attachment.html>

More information about the CLDR-Users mailing list