ZWJ sequences in UTR #51 v4

Christoph Päper christoph.paeper at
Fri Aug 12 18:29:47 CDT 2016


zelpa <zelpahd at>:

> Some of the ZWJ sequences in the latest revision seem sort of arbitrary,


It’s a fundamental principle of linguistics that signs connect representation and meaning arbitrarily, but this doesn’t apply to pictures and proto-writing, which are not (quite/yet) linguistic signs.

> why is male health worker Man + Staff of Asclepius instead of introducing a Doctor emoji and simply using the female of male modifiers?

I do agree with the general approach to encode additional professions as ZWJ sequences. 

Ideally, people would already be using emoji sequences for professions (without ZWJ, “emoji words”) and there was research of such compounds, so Unicode could document existing conventions. Otherwise, one could also go ahead and conduct a user study by letting a representative sample of people express a meaning with a restricted repertoire (i.e. emojis already in Unicode). 
Alas, neither seems not to have been done, instead a committee of experts chose canonic sequences based upon vendor proposals (Google and Apple). Interestingly, the result – currently in beta state – is not systematic in any way whatsoever: Professions are arbitrarily identified by a tool ������������, clothing ��, accessory ��, product ��, building ����, vehicle ����✈️ or already conventionalized symbol ⚕⚖. Often these are directly featured in the example image, but not always. Chances are high that sequences in the wild, which are intended to represent the same professions, are using different components.

With family emojis, ZWJ sequences (and Fitzpatrick modifiers) are very similar to classic ligatures, because the resulting glyph is just an elaborate composition of its bases. If the example images were intuitively obvious or mandatory design recommendations, this could also be true for many of the new profession emoji sequences, but this is in fact not the case since 1) font vendors are free to design an arbitrary iconographic *picture to represent the compound meaning*, 2) the sequences are not empirically founded and 3) are culturally biased (e.g. ����⚕).

If future emoji selection UIs offered the sequences by showing precomposed glyphs (like many do with families and flags), the problem would be hidden away for a while, but this will become unmanageable eventually. I expect IMEs to adopt a different approach soon: auto-correction. If a user successively enters two emojis that form an officially registered ZWJ sequence, the system will automatically insert U+200D and use a single glyph – hopefully the user will be able to revert or edit that composition, e.g. ZWJ→ZWNJ. The system will also try to identify juxtaposed (e.g. ����) or synonymous sequences (e.g. ���� or ���� for a farmer and ���� or ���� for a health worker) and suggest to replace them by the canonic sequence or even by a single character (e.g. ����, ��⛑ or ���� to ��). That’s basically `<3` and `:-)` TNG.

To make it simpler to learn the canonic sequences I’d strongly urge the people in charge to select as few generic patterns as possible, e.g. <person> + <building> or <tool>, and this should be based upon actual research.

> The current proposition also doesn't seem to allow for a gender-neutral doctor(?)

Yes, this is a problem with the <person> ZWJ <object> profession sequences, but, at least in theory, not with the <profession> ZWJ <gender> sequences, because they should be neutral by default. There absolutely should be a neutral base character to accompany Man and Woman, maybe U+263A ☺️ or U+1F610 ��, and perhaps more:

Codepoint |              |    | Meaning
U+263A    | White Smiley | ☺️ | Neutral, (details unknown, unimportant, unavailable)
U+1F469   | Woman        | �� | Female, woman, feminine
U+1F468   | Man          | �� | Male, man, masculine
U+1F475   | Older Woman  | �� | Retired female, senior woman, female expert
U+1F474   | Older Man    | �� | Retired male, senior man, male expert
U+1F476   | Baby         | �� | Trainee, learner, student, beginner, intern
U+1F467   | Girl         | �� | Female trainee, learner, student, beginner, intern
U+1F466   | Boy          | �� | Male trainee, learner, student, beginner, intern
U+1F47D   | Alien        | �� | Extraterrestrial, alien, foreign, out-sourced, anonymous
U+1F916   | Robot        | �� | Android, robot, automated service, machine, self-service, bot
U+1F63A   | Cat          | �� | Furry, humanoid/anthropomorphous animal, toon


More information about the Unicode mailing list