[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search

CLDR Ticket #7998(accepted charts)

Opened 4 years ago

Last modified 3 years ago

Usability issues with By-Type Chart: Main Exemplars

Reported by: ishida@… Owned by: mark
Component: charts Data Locale:
Phase: final Review:
Weeks: Data Xpath:


On Tue, Oct 21, 2014 at 2:23 PM, Richard Ishida <ishida@…> wrote:

On 20/10/2014 16:58, Mark Davis ☕️ wrote:

BTW, you might find

more useful.

That's certainly a very interesting page. There are, however, some
significant issues with it, which is a shame, because I think it could be
very useful.

  1. it would be a whole lot more useful if you could link directly to a

particular table (ie. add an id on each table).

That's a good idea, but adding a link on each data item would make the
files huge... Unless perhaps we could condense it by having a JS
function call instead...

I'm suggesting one id on each table, not data item. Eg. <table id=tibetan> for the Tibetan table.

Then I could point people to the right place from pages such as http://rishida.net/scripts/samples/tibetan.

  1. the table headers "Locale \ Chars" really shouldn't talk about "Chars"

since some of the entries are for groups of characters. Probably better to
say "Locale \ Graphemes".

Well, even the graphemes are made of characters. Everything is made of characters. But the point is about understanding what it is you're looking at. Many multi-character graphemes look like, or in some cases identical to, characters, so it's not immediately obvious what you're dealing with here, without prior knowledge. (and without {..} braces}

2a. Actually, what I was interested in really was 'what characters are
needed?', so I had to eliminate all these multiple character graphemes from
the list before finalising my data. (I wonder whether we could have a switch
to display either graphemes or characters?)

Our tooling (UnicodeSet) lets us "flatten" into a list of code points.

  1. The groups of characters are no longer distinguishable from the

individual characters, ie. what is represented as {...} elsewhere.

Right, we're using cell boundaries for that.

Yes, but you can't tell which are which without manipulating the content of the cell.

  1. Are the 'auxiliary characters' missing from this list, or are they

included without distinction? There's no clear link to them at the top.

missing; can you add that to the bug?

  1. If you follow the link to Punctuation Exemplars at the top, it throws up

questions about the organization of the data that make me think my initial
conclusions were wrong. It seems to be that a locale appears in a table if
that script is typically used for writing text in that locale, and that
characters not from that script are nontheless included in the table (eg. !,
?, [, ], etc.).

Yes, a locale (language) is grouped under a script if its main
exemplars appear in the script. Note that sr (= sr-Cyrl) and sr-Latn
are different locales, and will appear in different tables.

All the data in these charts is drawn from the exemplar data in each
of the locale .xml files.

It's not the source of the data that raises question, it's the interpretation of what you're seeing, its understandability and your confidence that you're making correct assumptions.

When looking at Main Exemplars, however, do the grouping of locales into
tables reflect the preferred script for that locale? Are only the characters
from that script shown in a given table?

No, only the characters from that locale. Eg, look for Serbian under
Latin script.

If characters from another script
block are used, eg. danda, should one expect that to appear in the table
where say the bn locale appears?

Is there no table for Mongolian using
Mongolian script because there's no data in CLDR,

We only have data for Mongolian Cyrillic.

or because only one table
per locale?

No (see above).

Basically are the two sets of tables following the same rules? And what are
those rules?

  1. Some locales are coloured red. What does that mean?

The data is not as well confirmed.

The page should say that.

  1. The Han table doesn't include any Han characters - only kana for the ja

locale. Very odd, and a little disconcerting. So perhaps the other tables
don't relate to unicode blocks after all...

There's no particular relation to unicode blocks per se. The Han
characters are omitted from the charts, simply for space.

So this table should be called Kana or some such, not Han, since there are no han characters in it.

With so many questions, the reliability of the data shown appears to be very
low, and so I'm unlikely to use it, unfortunately.

The data on all of the charts is really meant to illustrate what is in
the xml files, no more. While those files have been vetted multiple
times by native speakers, we know there will always be some level of
human error...

I should have said that the doubts about what is/isn't included, the organization, and the interpretation of what's being display create a *perception* that the reliability of the information to be got from the page is in doubt. It's more a question of whether I understand what I'm seeing well enough to feel condident in using it. (There are indeed issues with the data itself, but that's another story.)

Basically, issues are mainly lack of information about what the data
represents, how it's organized, what are the caveats, and how to read it.
Much of that information needs to be specific to this page and it's
presentation, rather than pointers to the LDML spec.

Again, the charts are best viewed as an adjunct to the LDML spec and
files... If we didn't have limitations on resources, it'd be a
different matter. So we have to pick and choose the most important
things to work on.

I guess i'm arguing that exposing the data in a useful way is as important as collecting it. Maybe getting it right will slow progress on collecting, but it seems an integral part of the overall exercise.

On the other hand, you've done a great job with the UI on many of your
pages, like the converters, so if you're interested ;-)

Well, i'm sorely tempted, but I'm way over-committed already at the moment :(

Btw, it would also be nice to have links for each locale that allowed you to
link directly to an input form if you see something that doesn't seem right.
It's not hard to find things that may need to be addressed, assuming you
understand the organization correctly, but unless the journey to provide
input is reduced to a simple click or two (and why shouldn't it be), I, and
I think many others, will not make that extra effort.

An interesting idea. We could have a link to the field in the Survey
Tool, and a link to the XML file.


Change History

comment:1 Changed 3 years ago by emmons

  • Owner changed from anybody to mark
  • Phase changed from dsub to final
  • Priority changed from assess to minor
  • Status changed from new to assigned
  • Milestone changed from UNSCH to 27

comment:2 Changed 3 years ago by mark

  • Phase changed from final to rc
  • Milestone changed from 27 to 28

This will take much more time than we have right now, so pushing to next version. There have been some fixes this release in navigation, etc.

comment:3 Changed 3 years ago by markus

  • Type set to charts
  • Component changed from charts to unknown

comment:4 Changed 3 years ago by srl

  • Status changed from assigned to accepted

comment:5 Changed 3 years ago by emmons

  • Type changed from charts to tools
  • Component changed from unknown to charts

comment:6 Changed 3 years ago by mark

  • Phase changed from rc to final

comment:7 Changed 3 years ago by mark

  • Milestone changed from 28 to 29

comment:8 Changed 3 years ago by mark

  • Type changed from tools to charts
  • Milestone changed from 29 to 28

comment:9 Changed 3 years ago by mark

  • Milestone changed from 28 to 29

comment:10 Changed 3 years ago by emmons

  • Milestone changed from 29 to upcoming

Auto move of all 29 -> upcoming


Add a comment

Modify Ticket

as accepted

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.