[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #9738(accepted tools)

Opened 14 months ago

Last modified 5 weeks ago

Speed up RegexLookup

Reported by: mark Owned by: emmons
Component: other Data Locale:
Phase: rc Review:
Weeks: Data Xpath:
Xref:

Description

Looking at the RegexTree code, I think it could be much faster.

Currently, it does a regex match on a bunch of prefixes as it descends.

Instead, process each of the items in the input list for the lookup.

For each, get the constant prefix: the longest initial string that doesn't contain any (non-literal) regex syntax or %. For example:

ldml/numbers/currencyFormats[@numberSystem="%A"]/unitPattern[@count="%A"]

has the constant prefix

ldml/numbers/currencyFormats[@numberSystem="

You can then build a data structure that uses these prefixes to pre-filter the lookup, avoiding a bunch of regex matches.

Attachments

Change History

comment:1 Changed 10 months ago by pedberg

  • Status changed from new to accepted
  • Component changed from unknown to other
  • Priority changed from assess to medium
  • Milestone changed from UNSCH to 31
  • Owner changed from anybody to emmons
  • Type changed from unknown to tools

Discuss design with TC before committing

comment:2 Changed 10 months ago by pedberg

  • Cc mark added

comment:3 Changed 9 months ago by emmons

  • Phase changed from dsub to rc

comment:4 Changed 9 months ago by emmons

  • Owner emmons deleted
  • Status changed from accepted to new

Not going to get to this anytime soon.

comment:5 Changed 9 months ago by emmons

  • Owner set to emmons
  • Status changed from new to accepted
  • Milestone changed from 31 to 32

comment:6 Changed 5 weeks ago by emmons

  • Milestone changed from 32 to UNSCH
View

Add a comment

Modify Ticket

Action
as accepted
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.