[Unicode]  Frequently Asked Questions Home | Site Map | Search

Variation Sequences

Q. What are variation sequences?

A. They are a standardized mechanism for indicating glyphic variants of base characters.

Q. What is the structure of a variation sequence?

A. It is a base character followed by a variation selector.

Q. What are the valid sequences?

A. Only those listed in StandardizedVariants.txt (for a formatted version, see StandardizedVariants.html) and in the Ideographic Variation Database.

Q. Can I define my own sequences?

A. No, that is the equivalent of trying to define an unassigned code point to be your own character. Use private use characters instead.

Q. How should variation sequences be displayed?

A. When they are valid variation sequences, they should be displayed as illustrated in StandardizedVariants.html or in the Ideographic Variation Database. If not, the base character should be displayed as normal, and the variation selector should be invisible. See Display of Unsupported Characters.

Q. How can variation sequences be handled in fonts?

A. For handling them with open-type fonts, see "Format 14: Unicode Variation Sequences" in the OpenType specification.

Q. What changes does a browser developer need to make in order to support variation sequences?

A. Quite often, browsers use a font substitution mechanism to show pages. This allows users to read text when the font specified in the web page is unavailable or doesn't support all the characters on that web page. A simple mechanism is to display characters in a font up to the first character that can't be displayed. That mechanism fails with variation sequences. A better mechanism is to treat a combining character sequence as a single entity for the purpose of font substitution. Because Variation Selectors have the General Category of Nonspacing Marks, this allows them to be handled correctly.

Q. Does this apply only to browser developers?

A. No, it applies more generally, to developers of any applications where font substitution is used.

Q. How should variation sequences be handled in search?

A. There are a number of different methods. The first and simplest method is to ignore any variation selectors when doing a search. Another method is to have a query without variation selectors match terms with any variation selectors, but a query with a specific variation selector will only match a term with that variation selector. Thus:

  • AB matches any string the form AB or any of the form A<VSn>B (for any n)
  • A<VSn>B matches only A<VSn>B.

Q. How should variation sequences be handled in IMEs (input method editors for CJK)?

A. They can be listed as options, just like single code points. However, if there are many options it may be worth having a pull-down menu associated with the base character.

Q. I'm proposing an addition to a historic script that is a variant of an existing character. Should I propose it as a new character or as a new variation sequence?

A. A variation sequence (VS) provides a means to specify a certain significant glyphic variation of a character, without encoding each variation as a separate character. This is particularly useful whenever such distinction is not universally necessary.

Because the character itself is part of the variation sequence, one should be able to search and find all the instances of that particular character, independent of variation in its appearance, a task which would be more complicated if the variants were encoded as separate characters. If you can replace the variant by the existing character without significantly distorting the content of the text, then a VS is the appropriate way to represent the variant, and you should propose your addition as a variation sequence.

For historic scripts, the VS provides a useful tool, because it can show mistaken or nonce glyphs and relate them to the base character.
It can also be used to reflect the views of scholars, who may see the relation between the glyphs and base characters differently. Also, new variation sequences can be added for new variant appearances (and their relation to the base characters) as more evidence is discovered.