[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search

CLDR Ticket #8257(accepted charts)

Opened 3 years ago

Last modified 3 years ago

Improved "findability" of cldr

Reported by: mark Owned by: mark
Component: unknown Data Locale:
Phase: final Review:
Weeks: Data Xpath:


From a thread started by Richard:

Last time I did this, Mark said it was useful. So i'm adding insult to injury by taking some extra time out of my weekend to document how i managed to waste a fair bit of time trying to find information in the CLDR maze.

I'd like to see a list of case conversion tailorings, just to check whether there are one or two i don't yet know about.

The Unicode Standard tells me to look in CLDR for case conversion tailoring information, so i head to cldr.unicode.org

Sure enough, under "Language & Script Information" i see 'capitalization' and so click on the "Language & Script Information" link.

I'm taken to http://cldr.unicode.org/cldr-features#TOC-Language-and-script-information: This seems to be the same information, just in list form - ah, but there are links to subitems...

Oh, no link for capitalization :( Stumped.

I notice a link to CLDR Charts in the nav column and decide to try to wrestle with the charts. I click on the link.

I arrive at http://cldr.unicode.org/index/charts. The 'By-type' heading looks promising, but there are no links there :(

I click on the link to http://www.unicode.org/cldr/charts/26/

Here again is By-Type. I click on it, and am taken to http://www.unicode.org/cldr/charts/26/by_type/index.html

There's a long list of links. I look through it. Right at the bottom I see Transforms. I haven't seen anything else that looks like it would lead to case information, so i click on it.

There's no 'On this page' set of links, so i start scrolling... I get to the end, but no joy.

I try the link at the top to "Linguistic Elements". I scroll through that page. No joy.

One last try? I click on Alphabetic Information. This is a long page, but after scrolling to the bottom, while looking out for headings, I still draw a blank.

I'll go back to the home page and try all over again. Hmm, i get sent back to the Charts page from the link at the top of the page, and it takes some abortive clicks to realise that i have to click the top right text on the page again to get to the home page. ... cldr.unicode.org *is* the home page and the best place to start from, right?...

Now I'm really stumped. (I actually tried following the path again, but got no better results.) I gave up. (Muttering and growling to myself about what a waste of time it is trying to find things in CLDR, and suspecting that I've been led on a wild goose chase from the start by the standard, and that there really isn't anything about case conversion in CLDR after all. At least, if there is, I don't really have time to look for it any more, especially if there's a risk of hitting more blank walls.)

I think you might be right. Whenever I'm stumped on where to look in CLDR, I go to TR35 and search for what the markup is. That can also be an adventure... and I didn't find case folding there this morning. Ditto my next strategy, which is to search likely keywords in the charts. Googling the topic turned up nothing useful. I suspect what you want is still SpecialCasing.txt?? For sure if it is in CLDR, I can't find it.

My response:

Richard, the data is in transforms, as you surmised. But we don't show all the transforms in the charts. The actual data shows them:


Such as:


Part of the problem is that we don't have a set of charts for all of the data in CLDR. But even if we did, there is the discovery issue.

We have bugs filed for creating more charts, but I'm wondering in the meantime if we can make some incremental improvements. What I was thinking about is

  1. fleshing out http://www.unicode.org/cldr/charts/latest/ to provide more information on where to find what.
  2. for each topic area (casing, etc), provide not only a pointer to charts for it, but also (especially if there is no chart) to the XML data.
  3. add a section on cldr.org page to highlight that as a place to go to for finding what's in CLDR.

What do you think?


Change History

comment:1 Changed 3 years ago by mark

From later email comments:

At the last edcom meeting, didn't we also take a look at the ICU Locale Explorer...? At least, Steven mentioned it in the meeting as I recall, and it's worth pointing people to if they're looking for basic data in an easy-to-read form.


One of the most frequent questions I see from end-users is about whether Unicode includes all the characters needed for some particular language -- the exemplars. They are never easy to find via cldr.unicode.org, but trivial to find in the locale explorer.

I think the CLDR-TC should go back to the drawing boards to seriously think
through their data *presentation* problem and then create charts and
expectations, according. (A re-browse through the Edward Tufte volumes to
adjust goals might be helpful.)

In the meantime, I strongly recommend making greater use of pointing
people at the ICU Locale Explorer, which as Rick points out, is *already*
much better to use and much more closely follows the expectations of
what kinds of data sets are available and can reasonably be found by an
average end user.

comment:2 Changed 3 years ago by emmons

  • Status changed from new to assigned
  • Component changed from unknown to charts
  • Priority changed from assess to medium
  • Phase changed from dsub to final
  • Milestone changed from UNSCH to 28
  • Owner changed from anybody to mark

comment:3 Changed 3 years ago by markus

  • Type set to charts
  • Component changed from charts to unknown

comment:4 Changed 3 years ago by srl

  • Status changed from assigned to accepted

comment:5 Changed 3 years ago by mark

  • Milestone changed from 28 to 29

comment:6 Changed 3 years ago by emmons

  • Milestone changed from 29 to upcoming

Add a comment

Modify Ticket

as accepted

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.