[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search

CLDR Ticket #8858(accepted data)

Opened 3 years ago

Last modified 13 months ago

CheckWidths performance

Reported by: emmons Owned by: emmons
Component: perf Data Locale:
Phase: rc Review:
Weeks: Data Xpath:


I did some preliminary analysis, and found out that the CheckWidths test takes the most time of any of CLDR's data tests. Mostly because it is doing a lot of regex lookups that make it inefficient. I recommend the following:

1). Use the STAR_PATTERN_LOOKUP algorithm for lookups in this test, similar to what we do for coverage.

2). Use just a single Limit for each regex instead of an array. There's no place where we're currently using an array with a size > 1.

3). Change the "aliased and comprehensive" check to just a check for narrow units, since doing such a check requires you to look up coverage, which is also slow. The test comments say as follows:

        // This was put in specifically to deal with the fact that we added a bunch of new units in CLDR 26
        // and didn't put the narrow forms of them into modern coverage.  If/when the narrow forms of all units
        // are modern coverage, then we can safely remove the aliasedAndComprehensive check.  Right now if an
        // item is aliased and coverage is comprehensive, then it can't generate anything worse than a warning.


Change History

comment:1 Changed 3 years ago by emmons

Just doing item 1 above cuts the total ConsoleCheckCLDR time from 4:48 down to 4:30 on my machine. That's pretty significant given that this is only one of many tests.

comment:2 Changed 3 years ago by emmons

  • Status changed from new to accepted
  • Component changed from unknown to perf
  • Priority changed from assess to major
  • Phase changed from dsub to rc
  • Milestone changed from UNSCH to 28
  • Owner changed from anybody to emmons
  • Type changed from unknown to data

comment:3 Changed 3 years ago by emmons

  • Milestone changed from 28 to 29

This is more complicated than I thought - moving to 29.

comment:4 Changed 3 years ago by emmons

  • Milestone changed from 29 to upcoming

Auto move of all 29 -> upcoming

comment:5 Changed 13 months ago by emmons

  • Milestone changed from upcoming to UNSCH

Add a comment

Modify Ticket

as accepted

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.