Unicode Frequently Asked Questions

Language Tagging and Tag Characters

Q: Do I always need to tag text with the language?

In most cases it is not necessary to tag text with language information. See Section 5.10, Language Information in Plain Text in The Unicode Standard.

However, there are situations where providing language information (outside the plain text stream) is useful. If you use markup languages, follow their guidelines on providing language information markup.

Q: Should I be using tag characters to spell out language tags?

The Unicode Standard contains a set of invisible format control characters, also known as "tag characters". The use of these characters for language tagging has been deprecated. See Section 23.9, Tag Characters in The Unicode Standard for a complete explanation.

Users who need to tag text with the language identity should be using standard markup mechanisms, such as those provided by HTML, XML, or other rich text mechanisms. In other contexts, such as databases or internet protocols, language should generally be indicated by appropriate data fields, rather than by embedded language tags or markup.

Q: Are the tag characters used for anything at all?

Tag characters, but not U+E0001 LANGUAGE TAG, are used in emoji tag sequences. For example flag emoji, such as those for subdivisions, use tag sequences. They are initiated with a specific emoji as base character and are terminated with U+E007F CANCEL TAG. See Annex B: Valid Emoji Tag Sequences in UTS #51.