[Unicode]  Frequently Asked Questions Home | Site Map | Search

Emoji and Dingbats

Q: What are emoji?

A: Emoji are “picture characters” most frequently associated with cellular telephone usage in Japan, but also used in other East Asian countries and in other contexts. Use of emoji is also growing outside East Asia; some emoji-enabling applications for smartphones are very popular. Emoji are often pictographs—images of things such as faces, weather, vehicles and buildings, food and drink, animals and plants—or icons that represent emotions, feelings, or activities. In cellular phone usage, many emoji characters are presented in color (sometimes as a multicolor image), and some are presented in animated form, usually as a repeating sequence of two to four images—for example, a pulsing red heart. [PE]

Q: Are emoji the same thing as emoticons?

A: Not exactly. Emoticons (from “emotion” plus “icon”) are specifically intended to depict facial expression or body posture as a way of conveying emotion or attitude in e-mail and text messages. They originated as ASCII character combinations such as :-) to indicate a smile—and by extension, a joke—and :-( to indicate a frown. In East Asia, a number of more elaborate sequences have been developed, such as (")(-_-)(") showing an upset face with hands raised. Over time, many systems began replacing such sequences with images, and also began providing ways to input emoticon images directly, such as a menu or palette. The emoji sets used by Japanese cell phone carriers contain a large number of characters for emoticon images. [PE]

Q: How have emoji been encoded on cell phones?

A: Cell phone carriers in Japan have long encoded some emoji in Shift-JIS and ISO-2022 as extensions of the JIS X 0208 character set. A core set of 722 emoji constitutes the union of the emoji sets encoded in this way by the three most popular cell phone carriers in Japan. These core emoji characters are interchanged as plain text by millions of people daily (in SMS text messages and e-mail subject lines, for example) , and need to be handled by e-mail systems, search engines, publishing systems, databases, and so on. For emoji beyond this core set (including those that are still being created), vendors have added rich text support, and use approaches such as embedded graphics. Similar techniques (embedded graphics or escape tags designating emoji) are also typically used for emoji support in China and the Republic of Korea. [PE]

Q: How are emoji encoded in Unicode?

A: 114 characters in the core emoji set are mapped to sequences of one or more characters available in Unicode before Version 6.0. The other 608 characters in the core emoji set are mapped to sequences of one or more characters added in Unicode 6.0, primarily in the blocks for Miscellaneous Symbols and Pictographs, Emoticons, Transport and Map Symbols, but also in blocks such as Dingbats and Technical Symbols. There is no block set aside specifically for emoji.

Characters that are separate in the extended JIS X 0208 sets used by the three major cell phone carriers in Japan are mapped to separate characters in Unicode in what is known as the Emoji Source Separation Rule. For example, The emoji core set includes a character mapped to U+1F3B5 MUSICAL NOTE; this could not be unified with U+226A EIGHTH NOTE, because both exist as separate characters in the extended JIS sets used by all three of the major cell phone carriers in Japan.

Because characters in the core emoji set are treated as pictographs, they are encoded in Unicode based primarily on their general appearance, not on an intended semantic. In fact, when used as emoji, many of these characters acquire multiple meanings based on their appearance; for example, an emoji character for “bank” which includes the letters “BK” has taken on the secondary meaning “bakkureru” (a slang term for evading one's responsibilities). The identity of characters in the emoji core set is defined primarily by their mapping to Unicode, as specified in the file EmojiSources.txt. [PE]

Q: How should emoji be displayed?

A: While emoji symbols may be presented using color and animation, they need not be. Because many characters in the core emoji sets are unified with Unicode characters that originally came from other sources, there is no way based on character code alone to tell whether a character should be presented using an “emoji” style; that decision depends on context. [PE]

Q: What about characters whose name specifies a color?

A: Some of the characters from the core emoji sets have names that include a color term, for example, BLUE HEART or ORANGE BOOK. These color terms in the names do not imply any requirement about how a character must be presented; they are intended only to help identify the corresponding character in the core emoji sets. Even names of symbols such as BLACK MEDIUM SQUARE or WHITE MEDIUM SQUARE are not meant to indicate that the corresponding character must be presented in black or white, respectively; rather, the use of black and white is generally just to contrast filled versus outline shapes, or a darker color fill versus a lighter color fill. [PE]

Q: What is the difference between emoji and dingbats?

A: Most of the characters in the Dingbats block are derived from a well-established set of glyphs, the ITC Zapf Dingbats series 100, which constitutes the industry standard “Zapf Dingbat” font currently available in most laser printers. Emoji and dingbats have some similarities (and a few core emoji characters are mapped to characters in the Dingbats block). However, while there is often a great deal of flexibility in the range of glyph shapes that may be used for presentation of emoji, most characters in the Dingbats block are expected to be presented with glyph shapes that closely align with those shown in the Unicode Standard. [PE]

Q: How do emoji relate to other Japanese symbol sets?

A: Other symbol sets defined in Japanese standards overlap extensively with the characters in the core emoji set. For example:

  • Many characters from the Japanese television standard ARIB STD-B24 2007 (from the Association of Radio Industries and Businesses) were added to Unicode in Version 5.2, and are mapped to characters in the core emoji set.

  • The Japanese recording industry standard RIS-506-1996 specifies an extension of Shift-JIS for use in Music CD text, and includes a number of characters similar to those in the core emoji set. [PE]

Q: What about Wingdings and Webdings? Are they encoded, and if not, why?

A: Many of the symbols in Microsoft's Webdings and Wingdings series fonts have already been encoded in Unicode. A proposal for encoding the remainder of these symbols has been approved by the Unicode Technical Committee and is currently working its way through the ISO balloting process. [PE]


Access to Copyright and terms of use