[Unicode]   Common Locale Data Repository : Bug Tracking Home | Site Map | Search

CLDR Ticket #10097(new unknown)

Opened 6 months ago

Supply a bit more information about collation of emoji

Reported by: mark Owned by: anybody
Component: unknown Data Locale:
Phase: dsub Review:
Weeks: Data Xpath:


We should document more clearly how ordering of emoji works, both in LDML and in UTS #51. Others should review the following account and note any corrections.

There is a root collation for "emoji" in CLDR. So use of "-u-co-emoji" in a Unicode locale identifier will access that ordering. Example:

collator = Collator.getInstance(ULocale.forLanguageTag("en-u-co-emoji"));

However, use of the emoji will supplant the language's customizations. So the above is the equivalent of:

collator = Collator.getInstance(ULocale.forLanguageTag("und-u-co-emoji"));

The same structure will not work for a language that does require customization, like Danish. That is, the following will fail.

collator = Collator.getInstance(ULocale.forLanguageTag("da-u-co-emoji"));

For that, a slightly more cumbersome method needs to be employed, which is to take the rules for Danish, and explicitly add the rules for emoji.

RuleBasedCollator collator = new RuleBasedCollator(

((RuleBasedCollator) Collator.getInstance(ULocale.forLanguageTag("da"))).getRules()

+ ((RuleBasedCollator) Collator.getInstance(ULocale.forLanguageTag("und-u-co-emoji"))).getRules());

raw , Z a y ü ☹️ ✈️️ 😀
en , ☹️ ✈️️ 😀 a ü y Z
en-u-co-emoji , a ü y Z 😀 ☹️ ✈️️
da , ☹️ ✈️️ 😀 a y ü Z
da-u-co-emoji , a ü y Z 😀 ☹️ ✈️️
combine , a y ü Z 😀 ☹️ ✈️️

raw = code point comparison
combine = method above for combining rules



Add a comment

Modify Ticket

as new

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.