[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #10973(reviewing tools)

Opened 5 months ago

Last modified 5 days ago

Emoji keywords disputes due to additional keywords

Reported by: kristi Owned by: tbishop
Component: annotations Data Locale:
Phase: dsub Review: mark
Weeks: Data Xpath:
Xref:

ticket:10980

ticket:11165

Description

We have a large number of disputes for emoji keywords
Here's a scenario:

  1. Most of the existing data is in Approved state
  2. Vetter A is the first to come and vote by adding one new keyword. This become a suggestion
  3. Vetter B likes A's suggestion, but also want to add a new keyword. This becomes yet another suggestion
  4. Until Vetter A goes back to vote again for Vetter B's suggestion, the Approved data does not change.

Problem:
This causes one of the vetters to redo their voting work all over again.
This causes increased number of disputes for Emoji keywords.

Attachments

T10973 Screen Shot 2018-06-08.png (56.2 KB) - added by tbishop 6 weeks ago.
a|b=4, a|b|c|d=8, a|b|e=8 should make a|b|e, a|b|c|d, a|b
T10973 SmokeTest 4 votes.png (380.0 KB) - added by tbishop 5 days ago.
SmokeTest: new vote with 4 votes not winning.

Change History

comment:1 Changed 5 months ago by fredrik

  • Xref set to 10980

Long-term "blue sky" idea filed as bug:10980

comment:2 Changed 4 months ago by mark

While talking to Kirill and Chiara, I had a thought.
We could change the VoteResolver — without modifying the UI — to just modify the way it treats "sets" like annotations.
Treat a vote of 4 for {A, B, C} to be a vote for each of the components. So if we had

{A, B, C, F} 8 votes
{A, B, E} 6 votes
{A, E} 4 votes

We would treat in VR as votes:

A18
B12
C 8
E10
F 8

We accept anything with > 1/2 the top vote count as the top, so we would end up with:

{A, B, E}

(We can play with the proportions/rule. This is just blue-skying.)

Note that what we end up with might not be a input option. Ideally, we'd just introduce a new one. But in the meantime maybe we can take the largest subset, and failing that the smallest superset, or something like that.

Last edited 4 months ago by mark (previous) (diff)

comment:3 Changed 4 months ago by mark

  • Status changed from new to accepted
  • Component changed from unknown to annotations
  • Priority changed from assess to critical
  • Milestone changed from UNSCH to 34
  • Owner changed from anybody to mark
  • Type changed from unknown to tools

comment:4 Changed 3 months ago by emmons

  • Milestone changed from 34 to 33.1

comment:5 Changed 3 months ago by kristi

  • Owner changed from mark to tbishop

comment:6 Changed 3 months ago by mark

  • Owner changed from tbishop to backend

comment:7 Changed 3 months ago by mark

  • Owner changed from backend to tbishop

Whoops, caught in bulk change of others. Restoring

comment:8 Changed 3 months ago by tbishop

If we end up with {A, B, E}, or "A | B | E" using vertical-bar notation, should that result be calculated and displayed immediately as the currently winning value (in the "Winning" column), whenever someone votes?

Could this be implemented in resolveVotes in VoteResolver.java?

Should the new method of calculation apply to all values that can use vertical bar as a separator, or only when the code ends with "–keywords"? Or when the path starts with "ldml/annotations/annotation"?

Last edited 3 months ago by tbishop (previous) (diff)

comment:9 Changed 2 months ago by tbishop

Branch tbishop/t10973 has an implementation of an algorithm tentatively outlined by Mark by phone: "A, B, C in an item, D, E have votes, overall might be A+B+C-(D+E) votes". The key new function is calculateNewCountsBasedOnAnnotationComponents in VoteResolver.java.

Unit tests are in new file TestAnnotationVotes.java.

TODO: decide on the algorithm! Two different ideas have been proposed:

(1) Given input {a|b|c|f=8, a|b|e=6, a|e=4}, we get compMap {a=18, b=14, c=8, e=10, f=8}.
Accept anything with > 1/2 the top vote count (18/2 = 9) as the top, so we end up with a|b|e since a, b, and e all have > 9 in compMap. Note that what we end up with might not be an input option (for example, a|b is not one of the input options, although a|b|e is, coincidentally). Ideally, we'd just introduce a new one. But in the meantime maybe we can take the largest subset, and failing that the smallest superset, or something like that.

(2) A, B, C in an item, D, E have votes, overall might be A+B+C-(D+E) votes.
That is, again given input {a|b|c|f=8, a|b|e=6, a|e=4}, we again get compMap {a=18, b=14, c=8, e=10, f=8}.
For a|b|c|f we get 18 + 14 + 8 - 10 + 8 = 38
For a|b|e we get 18 + 14 - 8 + 10 - 8 = 26
For a|e we get 18 - 14 - 8 + 10 - 8 = -2
The winner for (2) is a|b|c|f, not a|b|e as in (1).

The first implementation follows (2), since (1) isn't clearly defined, given that introducing a novel combination of components isn't considered an option for this ticket, and it's not clear yet what's meant about largest subset or smallest superset...

comment:10 Changed 2 months ago by tbishop

I've implemented a kind of IRV (instant-runoff voting) where the implicit "next choice" of voters for an eliminated annotation A is determined as follows:

  • In the set of candidate annotations that haven't been eliminated, find the largest annotation B (largest in terms of having the most components) such that the components of B are a subset of the components of A.
  • If B doesn't exist, then, in the set of candidate annotations that haven't been eliminated, find the smallest annotation C (smallest in terms of having the fewest components) such that the components of C are a superset of the components of A.
  • If B or C exists, use it as the "next choice" of voters for A; otherwise, there is no "next choice" of voters for A.

The implementation passes the unit tests. Maybe this voting method is adequate.

Last edited 2 months ago by tbishop (previous) (diff)

comment:11 Changed 2 months ago by tbishop

I've changed the voting method to one based on this formula from Mark:

  • Find the total votes for each subitem (eg "b" in "b|c"). As the "modified" vote for the set, use the geometric mean of the subitems in the set.
  • Order the sets by that mean value, then by the smallest number of items in the set, then the fallback we always use (alphabetical)

This special voting method is used if, and only if, the path starts with ldml/annotations/annotation and does not contain Emoji.TYPE_TTS. In other words, it's a keyword path, not a name path.

Last edited 2 months ago by tbishop (previous) (diff)

comment:12 Changed 2 months ago by tbishop

I committed the changes for this to trunk in revision 14097.

Changeset: https://unicode.org/cldr/trac/changeset/14097

comment:13 Changed 2 months ago by tbishop

  • Status changed from accepted to reviewing
  • Review set to kristi

comment:14 Changed 2 months ago by tbishop

One eventual improvement to consider: the right sidebar could indicate when the special voting method has been applied, to clarify why "b|c" (20 votes) beats "a" (24 votes).

Last edited 2 months ago by tbishop (previous) (diff)

comment:15 Changed 2 months ago by mark

  • Review changed from kristi to mark

comment:16 Changed 2 months ago by mark

  • Milestone changed from 33.1 to 34

comment:17 Changed 6 weeks ago by tbishop

Based on discussion related to 11165 a modification is planned. After finding the set X with the largest geometric mean, check whether there are any supersets with "greater" raw votes, and that don't exceed the width limit. If not, pick X to be the winning set. If so, pick the one of those supersets with the highest vote (using the normal tie breaker) to be the winning set Y. "Greater" here means that rawVote(Y) ≥ rawVote(X) + n, where the value of n is still to be decided. We might choose n = 2 to force there to be at least one non-guest vote. To make that decision we'll study the consequences of different values of n for some test cases.

comment:18 Changed 6 weeks ago by tbishop

Unit test results on localhost; TestAV08 thru TestAV19 are new:

TestAnnotationVotes {

TestAV00✅ adjustAnnotationVoteCounts(null, null) should return quietly

(0.001s) Passed

TestAV01✅ adjustAnnotationVoteCounts for a=100, b=99, c=98 should return unchanged

(0.885s) Passed

TestAV02✅ adjustAnnotationVoteCounts for a|b=1, c|d=2, e|f=3 should reverse order

(0.000s) Passed

TestAV03✅ adjustAnnotationVoteCounts for a=2, b=2, b|c=1 should make b, a, b|c

(0.003s) Passed

TestAV04✅ adjustAnnotationVoteCounts for a|b|c|f=8, a|b|e=6, a|e=4 should make a|b|e, a|e, a|b|c|f

(0.000s) Passed

TestAV05✅ adjustAnnotationVoteCounts for a=3, b|c=2, b|c|d=2 should make b|c, b|c|d, a

(0.002s) Passed

TestAV06✅ adjustAnnotationVoteCounts for a|b|c=8, a|b|d=6, a|d=4 should make a|b|d, a|d, a|b|c

(0.000s) Passed

TestAV07✅ adjustAnnotationVoteCounts for a=24, b|c=20, b|c|d=20 should make b|c, b|c|d, a

(0.001s) Passed

TestAV08✅ adjustAnnotationVoteCounts for hmyz | malárie | moskyt | štípnutí | virus, dengue | hmyz | malárie | moskyt | štípnutí | virus ...

(0.000s) Passed

TestAV09✅ adjustAnnotationVoteCounts for b|c|d|e|f=4, a|b|c|d|e|f=8 should make a|b|c|d|e|f, b|c|d|e|f

(0.000s) Passed

TestAV10✅ adjustAnnotationVoteCounts for b|c|d|e|f=4, a|b|c|d|e|f=6 should make a|b|c|d|e|f, b|c|d|e|f

(0.001s) Passed

TestAV11✅ adjustAnnotationVoteCounts for b|c|d|e|f=4, a|b|c|d|e|f=5 should make b|c|d|e|f, a|b|c|d|e|f

(0.000s) Passed

TestAV12✅ adjustAnnotationVoteCounts for a|b=4, a|b|d=8, a|b|c=8 should make a|b|c, a|b|d, a|b

(0.000s) Passed

TestAV13✅ adjustAnnotationVoteCounts for a|b=4, a|b|d=8, a|b|c=7 should make a|b|d, a|b|c, a|b

(0.001s) Passed

TestAV14✅ adjustAnnotationVoteCounts for a|b=8, a=8 should make a, a|b

(0.000s) Passed

TestAV15✅ adjustAnnotationVoteCounts for a|b=8, a=4 should make a|b, a

(0.000s) Passed

TestAV16✅ adjustAnnotationVoteCounts for a|b=4, a|b|d=8, a|b|c=7 should make a|b|d, a|b|c, a|b

(0.001s) Passed

TestAV17✅ adjustAnnotationVoteCounts for a|b=4, a|b|c|d=8, a|b|e=8 should make a|b|e, a|b|c|d, a|b

(0.000s) Passed

TestAV18✅ adjustAnnotationVoteCounts for a|b|c|d=8, a|b|e=8 should make a|b|e, a|b|c|d

(0.001s) Passed

TestAV19✅ adjustAnnotationVoteCounts for a|b=4, a|b|c|d|e|f|g|h=9, a|b|c|x|y=8 should make a|b|c|x|y, a|b, a|b|c|d|e|f|g|h

(0.000s) Passed

} (0.897s) Passed

comment:19 follow-up: ↓ 21 Changed 6 weeks ago by fredrik

This is horribly late to make this comment, I realize, but alas...

Are we over-complicating this? For single data points with one exclusive value, a vote value of 8 has been enough to unopposed overturn a value. Perhaps we should consider any individual keyword with a vote of 8 or higher to be valid to be included in the overall set of keywords?

Con: there wouldn't be a way to vote against the inclusion of a keyword.

comment:20 Changed 6 weeks ago by tbishop

On smoketest, these results are confirmed by logging in as different users and voting:

"b|c|d|e|f=4, a|b|c|d|e|f=8 should make a|b|c|d|e|f, b|c|d|e|f" (cf. TestAV09)

"a|b=4, a|b|c|d=8, a|b|e=8 should make a|b|e, a|b|c|d, a|b" (cf. TestAV17)

I'll add a screenshot for the latter.

Changed 6 weeks ago by tbishop

a|b=4, a|b|c|d=8, a|b|e=8 should make a|b|e, a|b|c|d, a|b

comment:21 in reply to: ↑ 19 ; follow-up: ↓ 22 Changed 6 weeks ago by tbishop

Replying to fredrik:

This is horribly late to make this comment, I realize, but alas...

Are we over-complicating this? For single data points with one exclusive value, a vote value of 8 has been enough to unopposed overturn a value. Perhaps we should consider any individual keyword with a vote of 8 or higher to be valid to be included in the overall set of keywords?

Con: there wouldn't be a way to vote against the inclusion of a keyword.

Another con: there might be too many keywords with 8 or more votes. Currently a limit of 7 keywords is in effect.

I share the concern about over-complication, but don't know what would be best. Some complication seems inherent in the situation that votes can only be for combinations (complete sets), while the vote resolution method is based on individual keywords (components). It's hard to reconcile expectations based on looking at the issue in two ways, in terms of combinations and in terms of individual keywords.

comment:22 in reply to: ↑ 21 ; follow-up: ↓ 24 Changed 6 weeks ago by fredrik

Replying to tbishop:

Another con: there might be too many keywords with 8 or more votes. Currently a limit of 7 keywords is in effect.

I thought we decided to remove that limit in favor of just a warning?


I share the concern about over-complication, but don't know what would be best. Some complication seems inherent in the situation that votes can only be for combinations (complete sets), while the vote resolution method is based on individual keywords (components). It's hard to reconcile expectations based on looking at the issue in two ways, in terms of combinations and in terms of individual keywords.

Another example of unexpected results here: http://st.unicode.org/cldr-apps/v#/ar/Travel_Places/3f68784395af9769

Microsoft voted for only "صحن طائر" (total 4)
Apple and Google voted for both "صحن طائر" and "طبق طائر" (total 8).
Microsoft wins, presumably because "صحن طائر" gets 12 votes.

It is very hard to explain this result to vetters...

comment:23 Changed 6 weeks ago by kristi

  • Xref changed from 10980 to 10980,11165

comment:24 in reply to: ↑ 22 Changed 6 weeks ago by mark

Replying to fredrik:

Replying to tbishop:

Another con: there might be too many keywords with 8 or more votes. Currently a limit of 7 keywords is in effect.

I thought we decided to remove that limit in favor of just a warning?


I share the concern about over-complication, but don't know what would be best. Some complication seems inherent in the situation that votes can only be for combinations (complete sets), while the vote resolution method is based on individual keywords (components). It's hard to reconcile expectations based on looking at the issue in two ways, in terms of combinations and in terms of individual keywords.

Another example of unexpected results here: http://st.unicode.org/cldr-apps/v#/ar/Travel_Places/3f68784395af9769

Microsoft voted for only "صحن طائر" (total 4)
Apple and Google voted for both "صحن طائر" and "طبق طائر" (total 8).
Microsoft wins, presumably because "صحن طائر" gets 12 votes.

It is very hard to explain this result to vetters...

Fredrik,

  1. as a part of another bug, I changed the maximum limit for now, so that shouldn't play a role.
  2. the example you cite is on the Survey Tool, which doesn't have Tom's fix in yet. I went to SmokeTest (http://cldr-smoke.unicode.org/smoketest/v#/ar/Travel_Places/3f68784395af9769) and put in a votes for

صحن طائر | طبق طائر
and
صحن طائر
to match what is in the Survey Tool. That now works on SmokeTest, giving the result that you wanted.

comment:25 follow-up: ↓ 26 Changed 6 weeks ago by kristi

from my email
Bug summary: when there is an existing winning data, a new suggestion entered by a new vetter wins.

To repro the bug:

  1. Go to any existing emoji keywords that has no votes for this release:
  2. As a Vetter, add a new value

Result: the new value in #2 becomes the winning value immediately.
Expected: the new value to be an alternate value with 4 vote count. (until there’s more vote to it)

comment:26 in reply to: ↑ 25 Changed 6 weeks ago by tbishop

Replying to kristi:

...Expected: the new value to be an alternate value with 4 vote count. (until there’s more vote to it)

So the new voting method should enforce some conditions, such as "O > N and O ≥ 8, for established locales" specified here:

http://cldr.unicode.org/index/process#TOC-Draft-Status-of-Optimal-Field-Value

Is that right? I didn't realize that the part of the VoteResolver code revised in this ticket needed to accomplish that enforcement.

The geometric mean involves floating-point calculation. Before rounding the mean to an integer, the code multiplies it by ten to provide more precision. (Otherwise, for example, a mean of 4.2 and a mean of 4.1 would both get rounded to 4 and be treated as equal, but 42 > 41.) That may be the essential problem here: the old and new votes have different scales; 4 < 8 but 40 > 8. I'll study this and look for a solution. Possibilities:

  • change all vote counts to floating point (too many code changes to accomplish quickly?)
  • keep both "raw" and "adjusted" votes available for code that enforces conditions like "O ≥ 8" (complicated)
  • for annotation voting only, change the conditions to be like "O ≥ 80" (easier)
  • round to integers without multiplying by ten first (very easy; accept loss of precision, maybe only temporarily)

This is really a policy decision, though. The whole concept of adjusting votes for annotations based on the components of the annotation is mind-boggling. I'm just an engineer!

Last edited 6 weeks ago by tbishop (previous) (diff)

comment:27 Changed 5 weeks ago by tbishop

If we simply round to integers without multiplying by ten first, one of the unit tests fails:

    TestAV05 {
      Error: (TestAnnotationVotes.java:99) ❌ adjustAnnotationVoteCounts for a=3, b|c=2, b|c|d=2 should make b|c, b|c|d, a.
	Input:   	[a, b|c, b|c|d] {a=3, b|c=2, b|c|d=2}
	Expected:	[b|c, b|c|d, a]
	Actually:	[b|c, a, b|c|d] {a=3, b|c=4, b|c|d=3}
    } (0.000s) FAILED (1 failure(s))

Is this test realistic? Real votes are often multiples of 4.

I added a new test that's the same except all votes are multiplied by 4. It passes:

    TestAV20✅ adjustAnnotationVoteCounts for a=12, b|c=8, b|c|d=8 should make b|c, b|c|d, a
 (0.001s) Passed
Last edited 5 weeks ago by tbishop (previous) (diff)

comment:28 Changed 5 days ago by tbishop

I've gotten rid of the multiplication by ten, and revised TestAV05 so it passes.

I've committed to trunk in revision [14277]. Will test on smoketest.

comment:29 Changed 5 days ago by tbishop

There is a new ticket for eventual improvement by using floating point instead of integers for vote counts. See 11270.

comment:30 Changed 5 days ago by tbishop

Test on smoketest looks OK. Go to any existing emoji annotation set that has no votes for this release. As a Vetter, add a new value. The new value should be an alternate value with 4 vote count, not winning until there are more votes for it. See screenshot.

Changed 5 days ago by tbishop

SmokeTest: new vote with 4 votes not winning.

View

Add a comment

Modify Ticket

Action
as reviewing
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.