[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search
 
Modify

CLDR Ticket #7270(closed: fixed)

Opened 5 years ago

Last modified 4 years ago

collation for emoji

Reported by: mark Owned by: mark
Component: collation Data Locale:
Phase: rc Review: pedberg
Weeks: Data Xpath:
Xref:

Description

The DUCET collation for emoji is quite bad; we might want to add a collation for emoji.

See

http://unicode.org/draft/reports/tr51/emoji-ordering.html

But use with a browser that handles emoji (not Chrome), and a font like Symbola.

Attachments

Change History

comment:1 Changed 5 years ago by emmons

  • Status changed from new to assigned
  • Component changed from unknown to data-collation
  • Priority changed from assess to medium
  • Milestone changed from UNSCH to 26rc
  • Owner changed from anybody to mark
  • type changed from unknown to enhancement

comment:2 Changed 5 years ago by emmons

  • Cc pedberg added

comment:3 Changed 5 years ago by mark

  • Milestone changed from 26rc to 27dvet

Needs to wait until TR51 is further along.

comment:4 Changed 5 years ago by markus

  • Phase set to dvet
  • Milestone changed from 27dvet to 27

comment:5 Changed 4 years ago by mark

For a draft, see http://www.unicode.org/draft/Public/emoji/1.0/emoji-ordering.txt

​Note that this would *not* be a change to the root collation, but rather an additional named collation in root.xml

comment:6 follow-up: ↓ 8 Changed 4 years ago by markus

In my opinion, while I agree that the DUCET order of emoji symbols is not good, I think there is low or negative ROI for this.

I think there is low value: I don't see users caring about the order among symbols; AFAIK collation bug reports are only ever about letters (plus combining marks etc.). Strings consisting only of symbols are not normally sorted, so the order of symbols only makes a difference when comparing strings that contain words/names and differ only by symbols. This is very uncommon.

There is a cost: Development, review, feedback, maintenance, BCP 47 keyword, size of the collation rule string and of the binary data generated for it.

comment:7 Changed 4 years ago by mark

  • Keywords working added

comment:8 in reply to: ↑ 6 Changed 4 years ago by mark

Replying to markus:

In my opinion, while I agree that the DUCET order of emoji symbols is not good, I think there is low or negative ROI for this.

I think there is low value: I don't see users caring about the order among symbols; AFAIK collation bug reports are only ever about letters (plus combining marks etc.). Strings consisting only of symbols are not normally sorted, so the order of symbols only makes a difference when comparing strings that contain words/names and differ only by symbols. This is very uncommon.

There is a cost: Development, review, feedback, maintenance, BCP 47 keyword, size of the collation rule string and of the binary data generated for it.

There are significant areas where a better ordering is required: notably symbol pickers, but also any other time emoji are sorted. And while it doesn't make much of a difference for ordinary symbols (where the DUCET is halfway decent), it is so horrible for emoji that any sorted list looks childish. After all, implementations don't have to pick it up (but the size is also not that huge).

comment:9 Changed 4 years ago by mark

  • Status changed from assigned to reviewing
  • Review set to pedberg

Peter, assigning this for review, although we might want to do one more drop from tr51 before release; especially if you have feedback!

comment:10 Changed 4 years ago by pedberg

  • Status changed from reviewing to accepted

There are two entries for "emoji" in common/bcp47/collation.xml, one should be removed.

comment:11 Changed 4 years ago by pedberg

  • Phase changed from dvet to rc

comment:12 follow-up: ↓ 14 Changed 4 years ago by mark

fixed

comment:13 Changed 4 years ago by mark

  • Status changed from accepted to reviewing

comment:14 in reply to: ↑ 12 Changed 4 years ago by pedberg

  • Status changed from reviewing to closed
  • Resolution set to fixed

Replying to mark:

fixed

Well, except that the emoji ordering is since=27, not since=26. But I just fixed that.

comment:15 Changed 4 years ago by markus

  • Cc markus added

Replying to mark:

There are significant areas where a better ordering is required: notably symbol pickers,

Symbol pickers should be able to show some symbols in multiple categories so that users don't have to guess exactly how some designer categorized them; that is easily done with lists of symbols by category but cannot be done via collation. A sort order cannot put the same symbol into two different places.

but also any other time emoji are sorted. And while it doesn't make much of a difference for ordinary symbols (where the DUCET is halfway decent), it is so horrible for emoji that any sorted list looks childish.

This is true, but only matters when the only difference between strings is among Emoji symbols, which will be very rare. I don't see a use case for sorting symbol-only strings.

After all, implementations don't have to pick it up (but the size is also not that huge).

It will take some work to take it out, or else it's dead weight, even if small.

View

Add a comment

Modify Ticket

Action
as closed
Next status will be 'new'
Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.