[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #11049(design tools)

Opened 6 months ago

Last modified 7 weeks ago

public API for some CLDR classes

Reported by: markus Owned by: markus
Component: other Data Locale:
Phase: rc Review:
Weeks: 2 Data Xpath:
Xref:

ticket:10664

ticket:11050

ticket:11026

ticket:6325

Description

Please create public API for offline tools extracting data from CLDR. (I assume we still don't want to recommend CLDR code for runtime use.)

A number of projects try to read CLDR files directly (e.g., for emoji annotations) and have a hard time getting it right. It should be easy to read the files, validate them, do fallbacks to the proper parent locales, and get fully-resolved versions of the data.

Another use case is the LDML-to-ICU converter: Given a reasonable CLDR data access API, we could move the converter logic to the ICU project where it really belongs. That would let ICU then make changes (converting new CLDR structure, or changing the ICU structure) without needing to change CLDR code.

The API should be mostly stable.

Attachments

Change History

comment:1 follow-up: ↓ 2 Changed 6 months ago by srl

Use the Java classes.
Or use JSON where there are libraries such as cldr.js

comment:2 in reply to: ↑ 1 ; follow-up: ↓ 3 Changed 6 months ago by markus

Replying to srl:

Use the Java classes.

Sure, but that is only viable if some of them are marked and advertised as API and kept stable. Otherwise code breaks with every release.

Or use JSON where there are libraries such as cldr.js

I don't think we are interested in a JavaScript API for offline tooling, and JSON itself would only help in a subset of use cases if we want to carry fully-resolved data in that format, but that's very large.

comment:3 in reply to: ↑ 2 ; follow-up: ↓ 5 Changed 6 months ago by srl

  • Xref set to 10664 11050 11026

Replying to markus:

Replying to srl:

Use the Java classes.

Sure, but that is only viable if some of them are marked and advertised as API and kept stable.

See ticket:10664 for javadoc for important classes. That's exactly what these classes (CLDRFile etc) are for. It makes no sense to me in this project to create yet another API when there is already one.

Otherwise code breaks with every release.

Has this happened in practice? The SurveyTool and ICU converters (for example) consume CLDR data via these APIs, and they don't break with every release.

Actually, using the CLDR API can bring more stability, take ticket:11026 for example. We removed distinguishingItems (?!) which broke other implementations trying to parse CLDR data (it was wrong, anyway). But, the CLDR API was updated to parse the DTD comments instead.

Yes, the CLDR API should be held to a stability policy. It also has a serious problem with static getters. I did a lot of work to make CLDRConfig/CLDRPaths to not always make assumptions about its environment. These are convenient for one off single-thread command line tooling, but fell apart in a multithreaded and multi-data-version environment such as ST.

So I agree there's work to do. But we should hold CLDR to a stable API and work on improving that. Or create something new and make all CLDR tooling route through it. I don't think creating something unrelated to org.unicode.cldr.util is a good idea for this project now unless it somehow replaces it.

Or use JSON where there are libraries such as cldr.js

I don't think we are interested in a JavaScript API for offline tooling, and JSON itself would only help in a subset of use cases if we want to carry fully-resolved data in that format, but that's very large.

But your use case is offline extraction. You could use the JSON resolved data to avoid the "hard time getting it right" problem. And then, deduplicate/compress the data according to your specific environment.

ticket:11050 requests Annotation data in JSON format.

comment:4 Changed 6 months ago by srl

  • Xref changed from 10664 11050 11026 to 10664 11050 11026 6325

comment:5 in reply to: ↑ 3 Changed 6 months ago by markus

Replying to srl:

Replying to markus:

Replying to srl:

Use the Java classes.

Sure, but that is only viable if some of them are marked and advertised as API and kept stable.

See ticket:10664 for javadoc for important classes. That's exactly what these classes (CLDRFile etc) are for. It makes no sense to me in this project to create yet another API when there is already one.

I am not asking to "create API". I am asking to make it obvious (via documentation, API tags, or maybe moving some classes to an "api" or "public" package) which classes are ok for others to use -- and then only change that documented API when absolutely necessary. Not all of the Java-public methods on those classes need to be API either.

Otherwise code breaks with every release.

Has this happened in practice?

Yes, where people depended on some of the code and it then changed.

We have told people not to call CLDR code, but that then leads to incorrect use of the data.

The SurveyTool and ICU converters (for example) consume CLDR data via these APIs, and they don't break with every release.

Of course they don't break -- they are in the same repository and get refactored together with the other code.

Actually, using the CLDR API can bring more stability

You make it sound like there is already a "CLDR API". Is it documented, can people find it, can developers see which classes and methods are considered API and thus reasonably stable?

Yes, the CLDR API should be held to a stability policy. It also has a serious problem with static getters. I did a lot of work to make CLDRConfig/CLDRPaths to not always make assumptions about its environment. These are convenient for one off single-thread command line tooling, but fell apart in a multithreaded and multi-data-version environment such as ST.

This sounds good, and for what I am after to help our developers, single-threaded offline-tool use is sufficient. I would be happy with that as a first step.

So I agree there's work to do. But we should hold CLDR to a stable API and work on improving that. Or create something new and make all CLDR tooling route through it. I don't think creating something unrelated to org.unicode.cldr.util is a good idea for this project now unless it somehow replaces it.

Right.

Or use JSON where there are libraries such as cldr.js

I don't think we are interested in a JavaScript API for offline tooling, and JSON itself would only help in a subset of use cases if we want to carry fully-resolved data in that format, but that's very large.

But your use case is offline extraction. You could use the JSON resolved data to avoid the "hard time getting it right" problem. And then, deduplicate/compress the data according to your specific environment.

Our use case is offline extraction, but from a checked-in version of CLDR as usual. If we had to work from a secondary data format such as JSON, we would have to check those redundant files in as well. That makes no sense to me compared with reading the XML files we already have copied into our repository using the Java code we already have copied into our repository.

comment:6 follow-up: ↓ 8 Changed 6 months ago by markus

Let me clarify:

I am asking to "create API" in the sense that outside developers need to (a) be able to find classes and methods that are documented (and ideally packaged or tagged) for them to use, and (b) have some confidence that those classes and methods won't change willy-nilly.

I envision a small subset of existing CLDR classes. I imagine that it may be useful to Eclipse-move such classes to a new package, but while that would make "API" more obvious and easier to maintain, that's not absolutely required.

I am not asking to "create API" in the sense of designing and writing new code that would only be used by CLDR outsiders.

I am not asking to harden CLDR code for runtime production use.

comment:7 Changed 6 months ago by mark

This would be possible to do. And the code would certainly be more robust if we cleaned up some of the dependencies, moving to a new package. But what we'd need would be to know which APIs are currently being used, and we could focus on those. Otherwise, it would be a very large project.

comment:8 in reply to: ↑ 6 Changed 6 months ago by srl

Replying to markus:

Is it documented

Inconsistently. https://unicode.org/cldr/trac/ticket/10664

Yes, where people depended on some of the code and it then changed.

CLDR should have a stability policy (which we are both agreeing on) and treat this as a bug.

We have told people not to call CLDR code, but that then leads to incorrect use of the data.

That's the difficulty.

Let me clarify:

I am asking to "create API" in the sense that outside developers need to (a) be able to find classes and methods that are documented (and ideally packaged or tagged) for them to use, and (b) have some confidence that those classes and methods won't change willy-nilly.

agreed.

I envision a small subset of existing CLDR classes. I imagine that it may be useful to Eclipse-move such classes to a new package, but while that would make "API" more obvious and easier to maintain, that's not absolutely required.

I recommend CLDRFile, SupplementalDataInfo, and perhaps a couple of supporting items. These have been fairly stable for some time.

I am not asking to "create API" in the sense of designing and writing new code that would only be used by CLDR outsiders.

Documenting and identifying what we already have would be best.

I am not asking to harden CLDR code for runtime production use.

That should be clarified also.

comment:9 Changed 6 months ago by mark

  • Owner changed from anybody to markus
  • Priority changed from assess to major
  • Status changed from new to design
  • Milestone changed from UNSCH to 34

committee agreed: Need exact list of method names that need to be stable (as small as possible), and then can review and mark those, and add process.

comment:10 Changed 7 weeks ago by markus

  • Milestone changed from 34 to 35
View

Add a comment

Modify Ticket

Action
as design
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.