Registry for Emoji Ligature and Metaphor Sequences

Christoph Päper christoph.paeper at
Thu Aug 18 05:19:30 CDT 2016


Ever since the advent of emojis in Unicode they have been combined with each other to convey new meaning in a number of ways, similar to what has been described for other (ancient) pictographic codes and had been the base for some proper scripts. Many combinations can be observed in actual conversations, others are mostly restricted to contrived puzzles or riddles:

* Rebus: 
 An emoji needs to be read as a particular word in a certain language but must then be interpreted as one of its homonyms, e.g. �� EYE → English pronunciation /ai/ → ‘I’.

* Linguistic metaphor: 
 An emoji needs to be read as a particular synonymous word in a certain language but must then be interpreted as one of its other synonyms, e.g. �� CAT → English synonym ‘pussy’ → ‘vagina/vulva’ or �� CREATURE WITH HORNS → English ‘horny’. This is also used a lot for compounds, e.g. ���� COW AND POO → English ‘bullshit’.

* Graphical metaphor: 
 An emoji needs to be reinterpreted as a similar looking object, e.g. �� AUBERGINE, �� BANANA, �� CORN COB and many other phallus symbols. There are even second-degree metaphors such as ♋️ CANCER → ⟨69⟩ ‘69’ → mutual oral sex (existing pan-linguistic metaphor).

* Ideographic metaphor: 
 An emoji needs to be read as a (conventionalized) semantic modifier of adjacent emojis, e.g. ⚖ SCALES → related to the judiciary: ��⚖ → ‘male judge’ or ♀ VENUS SIGN → related to the female sex or feminine gender: ��♀️ → ‘female construction worker’ and some other of the proposed gender-dependent ZWJ sequences for professions.

* Metaphor sign:
 An emoji needs to be reinterpreted as a symbol or icon – gotta get my Peirce terms straight – for an existing metaphor in a particular language or culture, e.g. gestures like ���� INDEX FINGER POINTING AT RING FORMED OF THUMB AND INDEX FINGER for phallic sex or animal heads �� LION and �� DRAGON representing the English and Welsh national soccer teams, because the constituents of GB or the UK are not national entities (“countries”) in the sense of ISO 3166 which is used by Regional Indicator Symbol sequences like ���� and thus cannot be represented by emoji flags yet.

The “correct” interpretation (i.e. the one desired by the author) often relies on textual, discursive, cultural and even technical context, e.g. �� PEACH is often used as a visual metaphor for a human body part, but it can be either of ‘butt’, ‘vulva’ and (rarely) ‘cleavage’, which may lead to misunderstandings as in ���� ‘anal sex’ / ‘vaginal sex’ / ‘mammal sex’. Depending on the actual glyph and context, �� EYES and �� NOSE may be (mis)read as ‘boobs‘ and ‘penis’, respectively. Also consider the ongoing controversy regarding �� being rendered as a toy gun, and the directionality of emojis, e.g. ������ or ����, which doesn’t display as intended in Emoji One.

Proposed Consequences

The conventional metaphors that are frequently used without much context and even in isolation, e.g. the eggplant penis, show an obvious demand for the encoding of distinguished emojis. The peach controversy highlights that emojis become ambiguous if too many desired ones are not available. Anyhow, a proper proposal for the encoding of emojis for genitals and other missing body parts shall follow at a later time or be written by someone else, e.g. Emojidex.


The classic phonographic rebus combinations and the linguistic metaphors are language-dependent and must therefore remain out of scope for Unicode. The other types of emoji metaphors, however, often span across languages and cultures. They may rely on a certain degree of glyphic similarity among fonts and thus benefit from standardization. These sequences are, in my opinion, in scope of UTC specifications.

There are two types of emoji ligatures already: Emoji Modifier Sequences and Emoji ZWJ Sequences. The former joins an emoji with indeterminate main color with an explicit modifier (currently only for human skin tones). Most deployed examples of the latter regroup a sequence of emoji characters (mostly ����������, possibly with ❤️ and ��) into a single glyph, so U+1F46A �� ≈ U+1F468-200D-1F469-200D-1F466 (although the precomposed character doesn’t strictly specify the gender of the child or the parents). Both kinds may be combined.

Not least with Google’s recent gendered profession emojis proposal, the concept of ZWJ sequences has been expanded from visual ligatures to semantic and metaphoric compounds. That means, the can of worms  has already been opened. Yet, the same concept seems to have been shunned for Regional Indicators for sub-regions (i.e. flags for Scotland, Wales, England etc.) in favor of a generic system without proof of demand (TERIS etc.).

I firmly believe that the best solution to these related problems is a less formal Registry for Emoji Character Sequences (RECS, to give it a catchy acronym). It should be hosted and its rules be defined by UTC, but mostly run on itself.

Proposed RECS process:

- Anyone may claim a conventional meaning for a sequence of emojis free of charge via a web form. 
- This includes decompositions of existing emojis (e.g. ���� = ���� = ��♂ ≈ ��).
- If the reading depends on a particular language (or meets other exclusion criteria, like codifying a name or title) it is rejected by the RECS Board, otherwise it is open for approval. 
- While multiple sequences may map to the same meaning redundantly, every character sequence is listed only once until either rejected or approved.
- All variants resulting from applicable modifiers are included automatically by default. 
- An emoji sequence is approved when a non-trivial font implemented an appropriate composite glyph (routine).
- The implementation may rely upon the presence of ZWJs or upon users activating features specific to the font technology (e.g. `liga` and `clig` in Open Type).
- Every emoji sequence shall be formally registered for inclusion in StandardizedVariants.txt if two independent vendors support it in an interoperable manner. 
- The RECS is also used to record non-PUA characters – perhaps restricted to `So` “other symbols” – with property `Emoji=No` that at least one publicly available font represents (only) as emoji glyphs and hence may become proper candidates. <> VS15/16 variation sequences should be automatically registered for all of them and the UTC needs to decide on the value of the `Emoji_Representation` property individually.

The current equivalent of the RECS would be <> which is currently outdated or incomplete as, for instance, the Windows 10 Anniversary Update added multiple emoji character sequences to represent more types of families.

PS: The RECS may also be used to record universal named character entities `&foo;` and short codes `:foo:`, although both usually rely on English, just like UTC character names. Finally, it may also be used to collect classic Western and Eastern emoticons like `:-)` and `¯\_(ツ)_/¯`.

More information about the Unicode mailing list