Test Data Gone?
Steven R. Loomis
srl at icu-project.org
Mon Nov 16 08:32:49 CST 2015
Enviado desde nuestro iPhone.
> El 16 nov 2015, a las 4:44 AM, Mark Davis ☕️ <mark at macchiato.com> escribió:
> At the time we retracted it, it didn't appear that there was a lot of usage, and you really get a much more thorough test by comparing to ICU's implementation.
Right. An idea at IUC was rather than trying to scope test data as cldr conformance test data, to have a new effort that simply and explicitly records ICU's result for a certain Icu/cldr version somewhere for certain input values and certain formatting routines. People are doing this already, just combine efforts.
Maybe the results would be an Icu-maintained file instead of cldr, like a sample app.
> The data we previously had was mechanically generated from the data, not curated. It was created by generating concatenations of some chosen primary/secondary/tertiary characters together with the tailored+exemplar characters for each language.
>> On Mon, Nov 16, 2015 at 11:00 AM, Martin J. Dürst <duerst at it.aoyama.ac.jp> wrote:
>>> On 2015/11/16 15:30, Mark Davis ☕️ wrote:
>>> Probably the most thorough test you could use would be one that tests
>>> semi-random strings to see if you get the same results as ICU.
>> Good idea. For tailorings, one thing to do is to extract the characters used in the tailoring and to bias the semi-random strings heavily towards using these characters.
>> Based on my experience with testing data for normalization (NFC and friends), I can say that having a good set of test data is extremely useful for implementers. I strongly encourage the Unicode Consortium to curate such data, and implementers at all levels to contribute to it.
>> Regards, Martin.
>>> On Nov 16, 2015 06:32, "Markus Scherer" <markus.icu at gmail.com> wrote:
>>>> On Sun, Nov 15, 2015 at 10:56 AM, Cameron Dutro <cameron at lumoslabs.com>
>>>>> Great, thanks Markus. Having these files is wonderful, and we're using
>>>>> them to test our implementation already. It is my understanding however
>>>>> that they do not test individual locale tailorings, is that correct?
>>>> The UCA test file is only for the DUCET, corresponding to what we call the
>>>> "root locale". Actually, since CLDR tailors the default sort order, and ICU
>>>> implements that, CLDR has modified versions of those test files:
>>>> The ICU test file has a number of test cases for various locales, as
>>>> indicated in the test data. They assume CLDR collation data. More often, I
>>>> tried to make minimal assumption about the collation data, and copied
>>>> relevant parts of rules into the test data -- so some of the test cases
>>>> require a from-rules builder. As a result, this file might be too specific
>>>> for other implementations.
>>>> CLDR-Users mailing list
>>>> CLDR-Users at unicode.org
>>> CLDR-Users mailing list
>>> CLDR-Users at unicode.org
> CLDR-Users mailing list
> CLDR-Users at unicode.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the CLDR-Users