L2/99-270
WG2 N2085
Title: Comments on ISO/IEC 10646-1, Second Edition text, Draft 2
Source: Unicode Technical Committee, NCITS/L2
Status: Joint Unicode/US Contribution
Action: For Review and Disposition by JTC1/SC2/WG2
Date: September
9, 1999
This document provides comments on
draft two of the second edition of ISO/IEC 10646-1. There are certain features
of the Unicode Standard that many implementers find useful when providing
support for the character encoding defined by ISO/IEC 10646 and Unicode. With
our comments, we would like to bring two useful features to the attention of
the implementers of ISO/IEC 10646 - the Unicode bidirectional rendering
algorithm and the Unicode definitions of big-endian and little-endian forms of
UTF-16, which may be used in data interchange. We hope that even at this late
date, these comments can be considered
as input to draft two.
1.
Referencing
the Unicode Standard. To highlight, for character encoding implementers, the
features offered by the Unicode Standard, add the following text in either
section 1 Scope or 3 Normative references:
Both ISO/IEC 10646-1 and the Unicode Standard, Version 3.0
provide the identical character repertoire, names, and code values. Complementing
ISO/IEC 10646, the Unicode Standard additionally provides character properties,
algorithms, and definitions that are useful to implementers.
2.
The
Unicode bidirectional algorithm. This bidirectional rendering algorithm is
implemented widely by the industry. To bring it to the attention of ISO/IEC
10646 implementers, add the following text to section 19 Characters in
bi-directional context:
The rendering of characters in a bi-directional context is
correctly determined by following the bidirectional algorithm defined by the
Unicode Standard, Version 3.0. This algorithm is applicable when using explicit
bidirectional formatting characters (U+202A..U+202F) or when rendering
bidirectional text implicitly.
3.
Non-Octet
Encoding Forms. Driven by the need to successfully interchange UTF-16 data
across computer systems based on different machine architectures, the Unicode
Standard has added definitions for big-endian and little-endian encoding forms
of UTF-16. To reference these definitions, add the following text to Annex C,
either in section C.1 or as a new section between C.4 and C.5:
Non-Octet Encoding Forms
When not serialized as octets (see Clause 6.3), the order of
octets in UTF-16 may be specified by agreement between sender and recipient. In
particular, any of the encoding forms defined by the Unicode Standard, Version
3.0 (section 3.8) can be used:
UTF-16: with optional use of signature
UTF-16BE:
big-endian, with no signature
UTF-16LE: little-endian, with no signature