Test Data Gone?
cameron at lumoslabs.com
Mon Nov 16 20:15:07 CST 2015
Mark and Martin, interleaving the tailoring characters with control
characters seems like a totally valid approach, I'll give it a shot. There
was some mention that such combinations were mechanically generated for
CLDR before v22, does that code still exist somewhere? If I'm successful
generating combinations I can then sort them with ICU and compare the order
against our implementation.
Steven, I like the idea of a maintained file that records conformance data,
perhaps generated by ICU, although it seems a bit odd to keep it alongside
ICU instead of CLDR. Does ICU contain a lot of customizations that cause it
to sort in a different order, and if so, why are those not reflected in
On Mon, Nov 16, 2015 at 8:32 AM, Steven R. Loomis <srl at icu-project.org>
> Enviado desde nuestro iPhone.
> El 16 nov 2015, a las 4:44 AM, Mark Davis ☕️ <mark at macchiato.com>
> At the time we retracted it, it didn't appear that there was a lot of
> usage, and you really get a much more thorough test by comparing to ICU's
> Right. An idea at IUC was rather than trying to scope test data as cldr
> conformance test data, to have a new effort that simply and explicitly
> records ICU's result for a certain Icu/cldr version somewhere for certain
> input values and certain formatting routines. People are doing this
> already, just combine efforts.
> Maybe the results would be an Icu-maintained file instead of cldr, like a
> sample app.
> The data we previously had was mechanically generated from the data, not
> curated. It was created by generating concatenations of some chosen
> primary/secondary/tertiary characters together with the tailored+exemplar
> characters for each language.
> On Mon, Nov 16, 2015 at 11:00 AM, Martin J. Dürst <duerst at it.aoyama.ac.jp>
>> On 2015/11/16 15:30, Mark Davis ☕️ wrote:
>>> Probably the most thorough test you could use would be one that tests
>>> semi-random strings to see if you get the same results as ICU.
>> Good idea. For tailorings, one thing to do is to extract the characters
>> used in the tailoring and to bias the semi-random strings heavily towards
>> using these characters.
>> Based on my experience with testing data for normalization (NFC and
>> friends), I can say that having a good set of test data is extremely useful
>> for implementers. I strongly encourage the Unicode Consortium to curate
>> such data, and implementers at all levels to contribute to it.
>> Regards, Martin.
>> On Nov 16, 2015 06:32, "Markus Scherer" <markus.icu at gmail.com> wrote:
>>> On Sun, Nov 15, 2015 at 10:56 AM, Cameron Dutro <cameron at lumoslabs.com>
>>>> Great, thanks Markus. Having these files is wonderful, and we're using
>>>>> them to test our implementation already. It is my understanding however
>>>>> that they do not test individual locale tailorings, is that correct?
>>>> The UCA test file is only for the DUCET, corresponding to what we call
>>>> "root locale". Actually, since CLDR tailors the default sort order, and
>>>> implements that, CLDR has modified versions of those test files:
>>>> The ICU test file has a number of test cases for various locales, as
>>>> indicated in the test data. They assume CLDR collation data. More
>>>> often, I
>>>> tried to make minimal assumption about the collation data, and copied
>>>> relevant parts of rules into the test data -- so some of the test cases
>>>> require a from-rules builder. As a result, this file might be too
>>>> for other implementations.
>>>> CLDR-Users mailing list
>>>> CLDR-Users at unicode.org
>>> CLDR-Users mailing list
>>> CLDR-Users at unicode.org
> CLDR-Users mailing list
> CLDR-Users at unicode.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the CLDR-Users