[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search

CLDR Ticket #11176(reviewing tools)

Opened 2 months ago

Last modified 4 weeks ago

Can't change the vote on disputed items – Slovak

Reported by: miro.pollak@… Owned by: tbishop
Component: emoji Data Locale:
Phase: dsub Review: fredrik
Weeks: Data Xpath:


When trying to change a vote and support a losing translation, nothing happens when I click on the given losing translation.
For example here:
In the screenshot attached, I am trying to vote for the translation "balík | dar | darček | oslava | zabalený | zabalený darček".


SL_bug.jpg (72.7 KB) - added by miro.pollak@… 2 months ago.

Change History

Changed 2 months ago by miro.pollak@…

comment:1 Changed 2 months ago by fredrik

For reference, this is data point http://st.unicode.org/cldr-apps/v#/sk/Activities/5081701f9b269c35

I think the crux here is that the only difference between the alternate translation sets is that one includes the name of the emoji, which we would include automatically anyway, so the voting system doesn't see a difference between the two.

I think we should just include the name in the visible translation set to reduce confusion.

comment:2 Changed 2 months ago by tbishop

  • Owner changed from anybody to tbishop
  • Status changed from new to accepted

comment:3 Changed 4 weeks ago by tbishop

The currently winning set is "balík | dar | darček | oslava | zabalený".

The user was trying to vote for "balík | dar | darček | oslava | zabalený | zabalený darček".

There is a function whose exact purpose is to reduce sets like the latter to sets like the former:

     * Filter from the given set some keywords that include spaces, if they duplicate,
     * or are "covered by", other keywords in the set.
     * For example, if the set is {"bear", "panda", "panda bear"} (annotation was "bear | panda | panda bear"),
     * then remove "panda bear", treating it as "covered" since the set already includes "panda" and "bear".
     * @param sorted the set from which items may be removed
    public static void filterCoveredKeywords(TreeSet<String> sorted) {
        // for now, just do single items
        HashSet<String> toRemove = new HashSet<>();

        for (String item : sorted) {
            List<String> list = SPLIT_SPACE.splitToList(item);
            if (list.size() < 2) {
            if (sorted.containsAll(list)) {

In this particular case, "zabalený darček" happens also to be the name of the emoji (but that's not why it gets filtered out).

Fredrik has proposed to "include the name in the visible translation set to reduce confusion" (for all emoji names and annotation sets). One way to interpret "visible translation set" here is that the name would not be included the actual saved value of the set, but it would be included visibly when the set was displayed in the ST interface. Is that Fredrik's intended interpretation? And, if so, should a decision be made whether to adopt that policy?

If that decision is made, then it shouldn't be very hard to implement, though it is a bit complicated since the display of one item (the annotation set) would become dependent on the value of another item (the name). It helps that the two items are always adjacent in the interface.

Would it really reduce confusion? That's not completely clear. Problems that may need solving:

  • the user may not be aware that there's any rule about whether or not the name should be included in the set
  • the rule currently enforced by filterCoveredKeywords doesn't actually depend on the name, and the user may not understand this either
  • the interface may not treat the set including the name as equivalent to the set not including the name

The screenshot proves that the visible interface fails to treat the two sets as equivalent (or that was true 6 weeks ago, maybe different now, see comment 5), even though it may be true (I don't know) that "the voting system doesn't see a difference between the two." It seems to me that as long as the interface in any way treats them as two distinct sets, there's likely to be confusion.

Seemingly that function filterCoveredKeywords isn't being invoked everywhere it should be: maybe it's being invoked for vote resolution, but not for normalizing the two sets so that they're treated as equivalent in the interface, which seems necessary for a satisfactory solution. If that gets fixed, then it's a secondary question, whether or not the normalized set should include the name. It might be ideal for the interface to provide some explicit feedback when normalization causes the name to be removed; maybe a note could be shown in the Information Panel (right side-bar), or even a pop-up alert could be displayed. Alternatively, to reduce time spent on implementation, a brief note about exclusion of the name from the set could just always be included in the Information Panel.

Last edited 4 weeks ago by tbishop (previous) (diff)

comment:4 Changed 4 weeks ago by tbishop

It appears that Survey Tool currently does not perform any automatic modification of the set that depends on the name. The name and the set of keywords are two unrelated items, as far as the software is concerned.

The actual English names and sets normally are related in that the set includes the name. For example,

🔇 -name 	muted speaker 	
🔇 –keywords 	mute | muted speaker | quiet | silent | speaker


🎙 -name 	studio microphone
🎙 –keywords 	mic | microphone | music | studio

The function filterCoveredKeywords only operates on the set of keywords, without any dependency on the name. If the first set had "muted" instead of "mute", then "muted speaker" would be removed from the set by filterCoveredKeywords since it would be "covered" by "muted" and "speaker" as separate items of the set; but the name makes no difference to the set. In the second set, "studio microphone" is excluded, not because it's the name, but because it's covered by "studio" and "microphone".

Whatever changes we make should be consistent with the instructions at http://cldr.unicode.org/translation/short-names-and-keywords which say, "Don’t add emoji names (these will be added automatically)".

Last edited 4 weeks ago by tbishop (previous) (diff)

comment:5 Changed 4 weeks ago by tbishop

I'm encountering a further complication: I'm unable to reproduce the bug as shown in the screenshot. It appears that now the normalization DOES take place in the interface, so that if the currently winning (and only) value is "balík | dar | darček | oslava | zabalený", then if I enter "balík | dar | darček | oslava | zabalený | zabalený darček" as a new value, it's automatically normalized to "balík | dar | darček | oslava | zabalený".

Has the bug disappeared in the last 6 weeks?

comment:6 Changed 4 weeks ago by tbishop

It seems unlikely that the bug disappeared in the last 6 weeks. I suspect that it disappeared earlier than that, but in such a way that some older data stayed around for a while. So the value "balík | dar | darček | oslava | zabalený | zabalený darček" was still present 6 weeks ago as a losing value, but the user couldn't vote for it since it got normalized (correctly). Now that value is gone from the current data.

Can we close this ticket as already fixed?

comment:7 Changed 4 weeks ago by tbishop

  • Status changed from accepted to reviewing
  • Review set to fredrik

Add a comment

Modify Ticket

as reviewing

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.