UTC/US comments to 10646, second edition

L2/99-270

WG2 N2085

Title: Comments on ISO/IEC 10646-1, Second Edition text, Draft 2

Source: Unicode Technical Committee, NCITS/L2

Status: Joint Unicode/US Contribution

Action: For Review and Disposition by JTC1/SC2/WG2

Date: September 9, 1999

This document provides comments on draft two of the second edition of ISO/IEC 10646-1. There are certain features of the Unicode Standard that many implementers find useful when providing support for the character encoding defined by ISO/IEC 10646 and Unicode. With our comments, we would like to bring two useful features to the attention of the implementers of ISO/IEC 10646 - the Unicode bidirectional rendering algorithm and the Unicode definitions of big-endian and little-endian forms of UTF-16, which may be used in data interchange. We hope that even at this late date, these comments can be considered as input to draft two.

1. Referencing the Unicode Standard. To highlight, for character encoding implementers, the features offered by the Unicode Standard, add the following text in either section 1 Scope or 3 Normative references:

Both ISO/IEC 10646-1 and the Unicode Standard, Version 3.0 provide the identical character repertoire, names, and code values. Complementing ISO/IEC 10646, the Unicode Standard additionally provides character properties, algorithms, and definitions that are useful to implementers.

2. The Unicode bidirectional algorithm. This bidirectional rendering algorithm is implemented widely by the industry. To bring it to the attention of ISO/IEC 10646 implementers, add the following text to section 19 Characters in bi-directional context:

The rendering of characters in a bi-directional context is correctly determined by following the bidirectional algorithm defined by the Unicode Standard, Version 3.0. This algorithm is applicable when using explicit bidirectional formatting characters (U+202A..U+202F) or when rendering bidirectional text implicitly.

3. Non-Octet Encoding Forms. Driven by the need to successfully interchange UTF-16 data across computer systems based on different machine architectures, the Unicode Standard has added definitions for big-endian and little-endian encoding forms of UTF-16. To reference these definitions, add the following text to Annex C, either in section C.1 or as a new section between C.4 and C.5:

Non-Octet Encoding Forms

When not serialized as octets (see Clause 6.3), the order of octets in UTF-16 may be specified by agreement between sender and recipient. In particular, any of the encoding forms defined by the Unicode Standard, Version 3.0 (section 3.8) can be used:

UTF-16: with optional use of signature

UTF-16BE: big-endian, with no signature

UTF-16LE: little-endian, with no signature