Logo Draft Unicode Technical Report #19

UTF-32

Revision 5.0
Authors Mark Davis (mark.davis@us.ibm.com)
Date 1999-11-16
This Version http://www.unicode.org/unicode/reports/tr19/tr19-5.html
Previous Version http://www.unicode.org/unicode/reports/tr19/tr19-4.html
Latest Version http://www.unicode.org/unicode/reports/tr19

Summary

This document specifies an alias that can be used to refer to the subset of UCS-4 values that are valid Unicode code points.

Status

This document contains material which has been considered and approved by the Unicode Technical Committee for publication as a Draft Technical Report. At the current time, the specifications in this technical report are provided as information and guidance to implementers of the Unicode Standard, but do not form part of the standard itself. The Unicode Technical Committee may decide to incorporate all or part of the material of this technical report into a future version of the Unicode Standard, either as informative material or as normative specification. Please mail corrigenda and other comments to the author.

The content of all technical reports must be understood in the context of the appropriate version of the Unicode Standard. References in this technical report to sections of the Unicode Standard refer to the Unicode Standard, Version 3.0. See http://www.unicode.org/unicode/standard/versions for more information.


The preferred encoding form for Unicode text is the 16-bit form, UTF-16. There is also an 8-bit encoding form called UTF-8 that can be used to represent Unicode in environments where the 16-bit form is impractical due to compatibility constraints. In addition, some implementations may wish to use a 32-bit form, where each Unicode code point (aka scalar value) corresponds to a single 32-bit unit. Even those applications that do not use this form may want to convert to and from it for interoperability.

Such an encoding form is defined in ISO/IEC 10646, and called UCS-4. However, UCS-4 permits values that are not in the range of valid Unicode code points. The term UTF-32 can be used to refer to the subset of UCS-4 characters that are in the range of valid Unicode code points. The following lists the important features of this encoding form:

Since UTF-32 is simply a subset of UCS-4 characters, it is conformant to ISO/IEC 10646 as well as to the Unicode Standard.


Copyright © 1999 Unicode, Inc. All Rights Reserved.

The Unicode Consortium makes no expressed or implied warranty of any kind, and assumes no liability for errors or omissions. No liability is assumed for incidental and consequential damages in connection with or arising out of the use of the information or programs contained or accompanying this technical report.

Unicode and the Unicode logo are trademarks of Unicode, Inc., and are registered in some jurisdictions.