CLDR Ticket #10973(reviewing tools)
Emoji keywords disputes due to additional keywords
Reported by: | kristi | Owned by: | tbishop |
---|---|---|---|
Component: | annotations | Data Locale: | |
Phase: | dsub | Review: | mark |
Weeks: | Data Xpath: | ||
Xref: |
Description
We have a large number of disputes for emoji keywords
Here's a scenario:
- Most of the existing data is in Approved state
- Vetter A is the first to come and vote by adding one new keyword. This become a suggestion
- Vetter B likes A's suggestion, but also want to add a new keyword. This becomes yet another suggestion
- Until Vetter A goes back to vote again for Vetter B's suggestion, the Approved data does not change.
Problem:
This causes one of the vetters to redo their voting work all over again.
This causes increased number of disputes for Emoji keywords.
Attachments
Change History
comment:2 Changed 4 months ago by mark
While talking to Kirill and Chiara, I had a thought.
We could change the VoteResolver — without modifying the UI — to just modify the way it treats "sets" like annotations.
Treat a vote of 4 for {A, B, C} to be a vote for each of the components. So if we had
{A, B, C, F} 8 votes
{A, B, E} 6 votes
{A, E} 4 votes
We would treat in VR as votes:
A | 18 |
B | 12 |
C | 8 |
E | 10 |
F | 8 |
We accept anything with > 1/2 the top vote count as the top, so we would end up with:
{A, B, E}
(We can play with the proportions/rule. This is just blue-skying.)
Note that what we end up with might not be a input option. Ideally, we'd just introduce a new one. But in the meantime maybe we can take the largest subset, and failing that the smallest superset, or something like that.
comment:3 Changed 4 months ago by mark
- Status changed from new to accepted
- Component changed from unknown to annotations
- Priority changed from assess to critical
- Milestone changed from UNSCH to 34
- Owner changed from anybody to mark
- Type changed from unknown to tools
comment:7 Changed 3 months ago by mark
- Owner changed from backend to tbishop
Whoops, caught in bulk change of others. Restoring
comment:8 Changed 3 months ago by tbishop
If we end up with {A, B, E}, or "A | B | E" using vertical-bar notation, should that result be calculated and displayed immediately as the currently winning value (in the "Winning" column), whenever someone votes?
Could this be implemented in resolveVotes in VoteResolver.java?
Should the new method of calculation apply to all values that can use vertical bar as a separator, or only when the code ends with "–keywords"? Or when the path starts with "ldml/annotations/annotation"?
comment:9 Changed 2 months ago by tbishop
Branch tbishop/t10973 has an implementation of an algorithm tentatively outlined by Mark by phone: "A, B, C in an item, D, E have votes, overall might be A+B+C-(D+E) votes". The key new function is calculateNewCountsBasedOnAnnotationComponents in VoteResolver.java.
Unit tests are in new file TestAnnotationVotes.java.
TODO: decide on the algorithm! Two different ideas have been proposed:
(1) Given input {a|b|c|f=8, a|b|e=6, a|e=4}, we get compMap {a=18, b=14, c=8, e=10, f=8}.
Accept anything with > 1/2 the top vote count (18/2 = 9) as the top, so we end up with a|b|e since a, b, and e all have > 9 in compMap. Note that what we end up with might not be an input option (for example, a|b is not one of the input options, although a|b|e is, coincidentally). Ideally, we'd just introduce a new one. But in the meantime maybe we can take the largest subset, and failing that the smallest superset, or something like that.
(2) A, B, C in an item, D, E have votes, overall might be A+B+C-(D+E) votes.
That is, again given input {a|b|c|f=8, a|b|e=6, a|e=4}, we again get compMap {a=18, b=14, c=8, e=10, f=8}.
For a|b|c|f we get 18 + 14 + 8 - 10 + 8 = 38
For a|b|e we get 18 + 14 - 8 + 10 - 8 = 26
For a|e we get 18 - 14 - 8 + 10 - 8 = -2
The winner for (2) is a|b|c|f, not a|b|e as in (1).
The first implementation follows (2), since (1) isn't clearly defined, given that introducing a novel combination of components isn't considered an option for this ticket, and it's not clear yet what's meant about largest subset or smallest superset...
comment:10 Changed 2 months ago by tbishop
I've implemented a kind of IRV (instant-runoff voting) where the implicit "next choice" of voters for an eliminated annotation A is determined as follows:
- In the set of candidate annotations that haven't been eliminated, find the largest annotation B (largest in terms of having the most components) such that the components of B are a subset of the components of A.
- If B doesn't exist, then, in the set of candidate annotations that haven't been eliminated, find the smallest annotation C (smallest in terms of having the fewest components) such that the components of C are a superset of the components of A.
- If B or C exists, use it as the "next choice" of voters for A; otherwise, there is no "next choice" of voters for A.
The implementation passes the unit tests. Maybe this voting method is adequate.
comment:11 Changed 2 months ago by tbishop
I've changed the voting method to one based on this formula from Mark:
- Find the total votes for each subitem (eg "b" in "b|c"). As the "modified" vote for the set, use the geometric mean of the subitems in the set.
- Order the sets by that mean value, then by the smallest number of items in the set, then the fallback we always use (alphabetical)
This special voting method is used if, and only if, the path starts with ldml/annotations/annotation and does not contain Emoji.TYPE_TTS. In other words, it's a keyword path, not a name path.
comment:12 Changed 2 months ago by tbishop
I committed the changes for this to trunk in revision 14097.
comment:13 Changed 2 months ago by tbishop
- Status changed from accepted to reviewing
- Review set to kristi
comment:14 Changed 2 months ago by tbishop
One eventual improvement to consider: the right sidebar could indicate when the special voting method has been applied, to clarify why "b|c" (20 votes) beats "a" (24 votes).
comment:17 Changed 6 weeks ago by tbishop
Based on discussion related to 11165 a modification is planned. After finding the set X with the largest geometric mean, check whether there are any supersets with "greater" raw votes, and that don't exceed the width limit. If not, pick X to be the winning set. If so, pick the one of those supersets with the highest vote (using the normal tie breaker) to be the winning set Y. "Greater" here means that rawVote(Y) ≥ rawVote(X) + n, where the value of n is still to be decided. We might choose n = 2 to force there to be at least one non-guest vote. To make that decision we'll study the consequences of different values of n for some test cases.
comment:18 Changed 6 weeks ago by tbishop
Unit test results on localhost; TestAV08 thru TestAV19 are new:
TestAnnotationVotes {
TestAV00✅ adjustAnnotationVoteCounts(null, null) should return quietly
(0.001s) Passed
TestAV01✅ adjustAnnotationVoteCounts for a=100, b=99, c=98 should return unchanged
(0.885s) Passed
TestAV02✅ adjustAnnotationVoteCounts for a|b=1, c|d=2, e|f=3 should reverse order
(0.000s) Passed
TestAV03✅ adjustAnnotationVoteCounts for a=2, b=2, b|c=1 should make b, a, b|c
(0.003s) Passed
TestAV04✅ adjustAnnotationVoteCounts for a|b|c|f=8, a|b|e=6, a|e=4 should make a|b|e, a|e, a|b|c|f
(0.000s) Passed
TestAV05✅ adjustAnnotationVoteCounts for a=3, b|c=2, b|c|d=2 should make b|c, b|c|d, a
(0.002s) Passed
TestAV06✅ adjustAnnotationVoteCounts for a|b|c=8, a|b|d=6, a|d=4 should make a|b|d, a|d, a|b|c
(0.000s) Passed
TestAV07✅ adjustAnnotationVoteCounts for a=24, b|c=20, b|c|d=20 should make b|c, b|c|d, a
(0.001s) Passed
TestAV08✅ adjustAnnotationVoteCounts for hmyz | malárie | moskyt | štípnutí | virus, dengue | hmyz | malárie | moskyt | štípnutí | virus ...
(0.000s) Passed
TestAV09✅ adjustAnnotationVoteCounts for b|c|d|e|f=4, a|b|c|d|e|f=8 should make a|b|c|d|e|f, b|c|d|e|f
(0.000s) Passed
TestAV10✅ adjustAnnotationVoteCounts for b|c|d|e|f=4, a|b|c|d|e|f=6 should make a|b|c|d|e|f, b|c|d|e|f
(0.001s) Passed
TestAV11✅ adjustAnnotationVoteCounts for b|c|d|e|f=4, a|b|c|d|e|f=5 should make b|c|d|e|f, a|b|c|d|e|f
(0.000s) Passed
TestAV12✅ adjustAnnotationVoteCounts for a|b=4, a|b|d=8, a|b|c=8 should make a|b|c, a|b|d, a|b
(0.000s) Passed
TestAV13✅ adjustAnnotationVoteCounts for a|b=4, a|b|d=8, a|b|c=7 should make a|b|d, a|b|c, a|b
(0.001s) Passed
TestAV14✅ adjustAnnotationVoteCounts for a|b=8, a=8 should make a, a|b
(0.000s) Passed
TestAV15✅ adjustAnnotationVoteCounts for a|b=8, a=4 should make a|b, a
(0.000s) Passed
TestAV16✅ adjustAnnotationVoteCounts for a|b=4, a|b|d=8, a|b|c=7 should make a|b|d, a|b|c, a|b
(0.001s) Passed
TestAV17✅ adjustAnnotationVoteCounts for a|b=4, a|b|c|d=8, a|b|e=8 should make a|b|e, a|b|c|d, a|b
(0.000s) Passed
TestAV18✅ adjustAnnotationVoteCounts for a|b|c|d=8, a|b|e=8 should make a|b|e, a|b|c|d
(0.001s) Passed
TestAV19✅ adjustAnnotationVoteCounts for a|b=4, a|b|c|d|e|f|g|h=9, a|b|c|x|y=8 should make a|b|c|x|y, a|b, a|b|c|d|e|f|g|h
(0.000s) Passed
} (0.897s) Passed
comment:19 follow-up: ↓ 21 Changed 6 weeks ago by fredrik
This is horribly late to make this comment, I realize, but alas...
Are we over-complicating this? For single data points with one exclusive value, a vote value of 8 has been enough to unopposed overturn a value. Perhaps we should consider any individual keyword with a vote of 8 or higher to be valid to be included in the overall set of keywords?
Con: there wouldn't be a way to vote against the inclusion of a keyword.
comment:20 Changed 6 weeks ago by tbishop
On smoketest, these results are confirmed by logging in as different users and voting:
"b|c|d|e|f=4, a|b|c|d|e|f=8 should make a|b|c|d|e|f, b|c|d|e|f" (cf. TestAV09)
"a|b=4, a|b|c|d=8, a|b|e=8 should make a|b|e, a|b|c|d, a|b" (cf. TestAV17)
I'll add a screenshot for the latter.
Changed 6 weeks ago by tbishop
- Attachment T10973 Screen Shot 2018-06-08.png added
a|b=4, a|b|c|d=8, a|b|e=8 should make a|b|e, a|b|c|d, a|b
comment:21 in reply to: ↑ 19 ; follow-up: ↓ 22 Changed 6 weeks ago by tbishop
Replying to fredrik:
This is horribly late to make this comment, I realize, but alas...
Are we over-complicating this? For single data points with one exclusive value, a vote value of 8 has been enough to unopposed overturn a value. Perhaps we should consider any individual keyword with a vote of 8 or higher to be valid to be included in the overall set of keywords?
Con: there wouldn't be a way to vote against the inclusion of a keyword.
Another con: there might be too many keywords with 8 or more votes. Currently a limit of 7 keywords is in effect.
I share the concern about over-complication, but don't know what would be best. Some complication seems inherent in the situation that votes can only be for combinations (complete sets), while the vote resolution method is based on individual keywords (components). It's hard to reconcile expectations based on looking at the issue in two ways, in terms of combinations and in terms of individual keywords.
comment:22 in reply to: ↑ 21 ; follow-up: ↓ 24 Changed 6 weeks ago by fredrik
Replying to tbishop:
Another con: there might be too many keywords with 8 or more votes. Currently a limit of 7 keywords is in effect.
I thought we decided to remove that limit in favor of just a warning?
I share the concern about over-complication, but don't know what would be best. Some complication seems inherent in the situation that votes can only be for combinations (complete sets), while the vote resolution method is based on individual keywords (components). It's hard to reconcile expectations based on looking at the issue in two ways, in terms of combinations and in terms of individual keywords.
Another example of unexpected results here: http://st.unicode.org/cldr-apps/v#/ar/Travel_Places/3f68784395af9769
Microsoft voted for only "صحن طائر" (total 4)
Apple and Google voted for both "صحن طائر" and "طبق طائر" (total 8).
Microsoft wins, presumably because "صحن طائر" gets 12 votes.
It is very hard to explain this result to vetters...
comment:24 in reply to: ↑ 22 Changed 6 weeks ago by mark
Replying to fredrik:
Replying to tbishop:
Another con: there might be too many keywords with 8 or more votes. Currently a limit of 7 keywords is in effect.
I thought we decided to remove that limit in favor of just a warning?
I share the concern about over-complication, but don't know what would be best. Some complication seems inherent in the situation that votes can only be for combinations (complete sets), while the vote resolution method is based on individual keywords (components). It's hard to reconcile expectations based on looking at the issue in two ways, in terms of combinations and in terms of individual keywords.
Another example of unexpected results here: http://st.unicode.org/cldr-apps/v#/ar/Travel_Places/3f68784395af9769
Microsoft voted for only "صحن طائر" (total 4)
Apple and Google voted for both "صحن طائر" and "طبق طائر" (total 8).
Microsoft wins, presumably because "صحن طائر" gets 12 votes.
It is very hard to explain this result to vetters...
Fredrik,
- as a part of another bug, I changed the maximum limit for now, so that shouldn't play a role.
- the example you cite is on the Survey Tool, which doesn't have Tom's fix in yet. I went to SmokeTest (http://cldr-smoke.unicode.org/smoketest/v#/ar/Travel_Places/3f68784395af9769) and put in a votes for
صحن طائر | طبق طائر
and
صحن طائر
to match what is in the Survey Tool. That now works on SmokeTest, giving the result that you wanted.
comment:25 follow-up: ↓ 26 Changed 6 weeks ago by kristi
from my email
Bug summary: when there is an existing winning data, a new suggestion entered by a new vetter wins.
To repro the bug:
- Go to any existing emoji keywords that has no votes for this release:
- As a Vetter, add a new value
Result: the new value in #2 becomes the winning value immediately.
Expected: the new value to be an alternate value with 4 vote count. (until there’s more vote to it)
comment:26 in reply to: ↑ 25 Changed 6 weeks ago by tbishop
Replying to kristi:
...Expected: the new value to be an alternate value with 4 vote count. (until there’s more vote to it)
So the new voting method should enforce some conditions, such as "O > N and O ≥ 8, for established locales" specified here:
http://cldr.unicode.org/index/process#TOC-Draft-Status-of-Optimal-Field-Value
Is that right? I didn't realize that the part of the VoteResolver code revised in this ticket needed to accomplish that enforcement.
The geometric mean involves floating-point calculation. Before rounding the mean to an integer, the code multiplies it by ten to provide more precision. (Otherwise, for example, a mean of 4.2 and a mean of 4.1 would both get rounded to 4 and be treated as equal, but 42 > 41.) That may be the essential problem here: the old and new votes have different scales; 4 < 8 but 40 > 8. I'll study this and look for a solution. Possibilities:
- change all vote counts to floating point (too many code changes to accomplish quickly?)
- keep both "raw" and "adjusted" votes available for code that enforces conditions like "O ≥ 8" (complicated)
- for annotation voting only, change the conditions to be like "O ≥ 80" (easier)
- round to integers without multiplying by ten first (very easy; accept loss of precision, maybe only temporarily)
This is really a policy decision, though. The whole concept of adjusting votes for annotations based on the components of the annotation is mind-boggling. I'm just an engineer!
comment:27 Changed 5 weeks ago by tbishop
If we simply round to integers without multiplying by ten first, one of the unit tests fails:
TestAV05 { Error: (TestAnnotationVotes.java:99) ❌ adjustAnnotationVoteCounts for a=3, b|c=2, b|c|d=2 should make b|c, b|c|d, a. Input: [a, b|c, b|c|d] {a=3, b|c=2, b|c|d=2} Expected: [b|c, b|c|d, a] Actually: [b|c, a, b|c|d] {a=3, b|c=4, b|c|d=3} } (0.000s) FAILED (1 failure(s))
Is this test realistic? Real votes are often multiples of 4.
I added a new test that's the same except all votes are multiplied by 4. It passes:
TestAV20✅ adjustAnnotationVoteCounts for a=12, b|c=8, b|c|d=8 should make b|c, b|c|d, a (0.001s) Passed
comment:28 Changed 5 days ago by tbishop
I've gotten rid of the multiplication by ten, and revised TestAV05 so it passes.
I've committed to trunk in revision [14277]. Will test on smoketest.
comment:29 Changed 5 days ago by tbishop
There is a new ticket for eventual improvement by using floating point instead of integers for vote counts. See 11270.
comment:30 Changed 5 days ago by tbishop
Test on smoketest looks OK. Go to any existing emoji annotation set that has no votes for this release. As a Vetter, add a new value. The new value should be an alternate value with 4 vote count, not winning until there are more votes for it. See screenshot.
Changed 5 days ago by tbishop
- Attachment T10973 SmokeTest 4 votes.png added
SmokeTest: new vote with 4 votes not winning.
Long-term "blue sky" idea filed as bug:10980