Test Data Gone?

Cameron Dutro cameron at lumoslabs.com
Mon Nov 16 20:15:07 CST 2015

Mark and Martin, interleaving the tailoring characters with control
characters seems like a totally valid approach, I'll give it a shot. There
was some mention that such combinations were mechanically generated for
CLDR before v22, does that code still exist somewhere? If I'm successful
generating combinations I can then sort them with ICU and compare the order
against our implementation.

Steven, I like the idea of a maintained file that records conformance data,
perhaps generated by ICU, although it seems a bit odd to keep it alongside
ICU instead of CLDR. Does ICU contain a lot of customizations that cause it
to sort in a different order, and if so, why are those not reflected in
CLDR data?


On Mon, Nov 16, 2015 at 8:32 AM, Steven R. Loomis <srl at icu-project.org>

> Enviado desde nuestro iPhone.
> El 16 nov 2015, a las 4:44 AM, Mark Davis ☕️ <mark at macchiato.com>
> escribió:
> At the time we retracted it, it didn't appear that there was a lot of
> usage, and you really get a much more thorough test by comparing to ICU's
> implementation.
> Right. An idea at IUC was rather than trying to scope test data as cldr
> conformance test data, to have a new effort that simply and explicitly
> records ICU's result  for a certain Icu/cldr version somewhere for certain
> input values and certain formatting routines. People are doing this
> already, just combine efforts.
> Maybe the results would be an Icu-maintained file  instead of cldr, like a
> sample app.
> The data we previously had was mechanically generated from the data, not
> curated. It was created by generating concatenations of some chosen
> primary/secondary/tertiary characters together with the tailored+exemplar
> characters for each language.
> Mark
> On Mon, Nov 16, 2015 at 11:00 AM, Martin J. Dürst <duerst at it.aoyama.ac.jp>
> wrote:
>> On 2015/11/16 15:30, Mark Davis ☕️ wrote:
>>> Probably the most thorough test you could use would be one that tests
>>> semi-random strings to see if you get the same results as ICU.
>> Good idea. For tailorings, one thing to do is to extract the characters
>> used in the tailoring and to bias the semi-random strings heavily towards
>> using these characters.
>> Based on my experience with testing data for normalization (NFC and
>> friends), I can say that having a good set of test data is extremely useful
>> for implementers. I strongly encourage the Unicode Consortium to curate
>> such data, and implementers at all levels to contribute to it.
>> Regards,   Martin.
>> On Nov 16, 2015 06:32, "Markus Scherer" <markus.icu at gmail.com> wrote:
>>> On Sun, Nov 15, 2015 at 10:56 AM, Cameron Dutro <cameron at lumoslabs.com>
>>>> wrote:
>>>> Great, thanks Markus. Having these files is wonderful, and we're using
>>>>> them to test our implementation already. It is my understanding however
>>>>> that they do not test individual locale tailorings, is that correct?
>>>> The UCA test file is only for the DUCET, corresponding to what we call
>>>> the
>>>> "root locale". Actually, since CLDR tailors the default sort order, and
>>>> ICU
>>>> implements that, CLDR has modified versions of those test files:
>>>> http://unicode.org/cldr/trac/browser/trunk/common/uca/
>>>> The ICU test file has a number of test cases for various locales, as
>>>> indicated in the test data. They assume CLDR collation data. More
>>>> often, I
>>>> tried to make minimal assumption about the collation data, and copied
>>>> relevant parts of rules into the test data -- so some of the test cases
>>>> require a from-rules builder. As a result, this file might be too
>>>> specific
>>>> for other implementations.
>>>> markus
>>>> _______________________________________________
>>>> CLDR-Users mailing list
>>>> CLDR-Users at unicode.org
>>>> http://unicode.org/mailman/listinfo/cldr-users
>>> _______________________________________________
>>> CLDR-Users mailing list
>>> CLDR-Users at unicode.org
>>> http://unicode.org/mailman/listinfo/cldr-users
> _______________________________________________
> CLDR-Users mailing list
> CLDR-Users at unicode.org
> http://unicode.org/mailman/listinfo/cldr-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://unicode.org/pipermail/cldr-users/attachments/20151116/060e0e11/attachment.html>

More information about the CLDR-Users mailing list