Working Draft Proposal for Encoding Emoji Symbols
L2/07-257
Date: 2007-08-03
Authors: Kat Momoi, Mark Davis, Markus Scherer
This is a working draft proposal for the Symbols Subcommittee and UTC. We were
not able to finish this draft for review by the committee before the start of
the UTC meeting. However, the discussion of this working draft proposal at the
UTC is to be on Thursday, 2007-Aug-09, so comments by then are invited.
The proposal consists of two documents:
-
This summary document.
-
A draft table of correspondences.
This is a preliminary proposal, for discussion. The proposal does not yet
contain proposed code points since that would be premature. We have not had time
to review all of the characters in detail. There may be spelling errors and
other mistakes. In particular, we would appreciate special attention paid to
symbols for faces, books and arrows.
Background
This submission covers the Emoji symbols that are in widespread use by DoCoMo,
KDDI and Softbank for their mobile phone networks. These symbols are encoded in
carrier-specific versions of Shift-JIS (as User-Defined Characters). There are
mapping tables between these character sets, with both roundtrip and fallback
mappings.
We took into consideration the following factors in coming up with this working
draft:
-
Source separation rule: If a single
carrier separates two characters (anywhere in the character set, so
including standard JIS codes), then we mapped them to two separate Unicode
characters. (This is a hard and fast rule.)
-
Reuse: We mapped to existing Unicode
symbols where appropriate.
-
Separating generic symbols: If Unicode
had a set of related symbols, but no one character in the set was as generic
as in the Emoji symbol sets, then we encoded a new character. For example,
the Emoji sets do not distinguish between waxing and waning crescent moons.
-
Colors and Animation: We encoded symbols as characters, abstracting
away from colors and animation. We only distinguished by nominal color or
animation for the source separation rule. (See naming below.)
-
Existing cross-mapping tables: We
followed the tables mentioned above as much as possible, but we tentatively
disunified in some cases where the visual images were very different and not
semantically associated. For example:
-
We disunified the 'M' symbol for Metro from the Metro train image. The 'M'
symbol would have translation problems. (This is similar to the problems
with the international currency symbol and the proposal for a "generic
decimal separator".)
-
On the other hand, we unified the sets of Zodiac symbols, even though the
images shown by carriers vary widely. This is because they clearly belong
to a cohesive set which corresponds across carriers.
-
Least-marked common symbol: For a set
of symbols which each could map to an existing Unicode code point, we chose
the symbol that was shared among the most carriers (according to the
cross-mapping tables) and had the least-marked form.
Note: We tried to avoid disunification in
Unicode where there are roundtrip mappings between carriers. However, where
necessary, the disunification can be done. As the following diagram illustrates,
roundtrip mappings between carrier Shift-JIS character sets can be maintained,
by having the mapping tables between Unicode and each carrier's Shift-JIS
version use appropriate fallback mappings.
KDDI
|
|
Unicode
|
|
Softbank
|
x
|
↔
|
X
|
→
|
y
|
x
|
←
|
Y
|
↔
|
y
|
x
|
↔
|
y
|
Chart Legend
Use of colors:
-
Roundtrip (1:1) mapped symbols among carriers are unmarked (no background
color).
-
Best-fit fallback mappings are marked with gold background and a *.
-
Fallback mappings to sequences of codes are marked with blue background and
a + between the codes.
-
Fallback mappings to descriptive text rather than a symbol is marked with
purple background.
Symbol images are taken from reference materials from the carriers. Symbol
images for the Unicode column use a Unicode character where one exists, and
otherwise one of the carrier symbols. The latter would be replaced by an
appropriate black-and-white representative glyph.
Proposed character names are tentative, typically based on the glosses of the
carrier symbols or the visual appearance. We followed the conventions for
existing Unicode characters where possible, in particular using "BLACK" for
"filled" and "WHITE" for "hollow". We excluded nominal color and animation from
proposed character names except where necessary for distinction.
Resources
The conversion tables:
Additional non-carrier referenecs: For AU, DoCoMo and Softbank
See also: WAP Pictogram Specification approved Version 1.1 -- part of OMA
Browsing V2.3 Enabler Specification