Re: Unicode 3.2 comments - part 2 of 4

From: Otto Stolz (Otto.Stolz@uni-konstanz.de)
Date: Thu Feb 21 2002 - 04:39:51 EST


David Hopwood wrote:

> Below '#' is used to quote from the Unicode 3.2 standard as proposed
> in PDUTR #28, and '>' is used to quote my suggested changes.

I second David's thourough, and clearly presented, contribution.

However, I have to suggest one minor improvement:

> Conformance clauses
...

> This is what I think clauses C5 and C10 should be:
...
> > C10 A process shall make no change in a valid code sequence other
> > than the possible replacement of character sequences by their
> > canonical-equivalent sequences, if that process purports not to
> > modify the interpretation of that code sequence.
...

> > - Changing the bit or byte ordering when transforming between different
> > machine architectures does not modify the interpretation of the text.

I consider the bit ordering a hardware issue, invisible to the programmer
or the end-user; hence, I'd not mention it in this note.

W.r.t. the byte ordering, this note does apply only to UTF-16 and UTF-32
with
a BOM.

It does not apply to UTF-8, as this format implies a particular byte
ordering.

Neither does it apply to UTF-16LE, UTF16-BE, UTF32-LE, UTF32-BE; rather,
swapping the byte-order, in any one of these formats, amounts to trans-
forming to a different UTF, viz. UTF-16BE, UTF16-LE, UTF32-BE, and UTF-32LE,
respectively.

> > - Transforming to a different Unicode Transformation Format does not
> > modify the interpretation of the text.

Hence, I propose the following wording for the last two notes on the
proposed C10 clause:

| - Changing the byte ordering of a string encoded in either UTF-16,
| or UTF-32, when a Byte Order Mark is present, does not modify the
| interpretation of the text.
|
| - Transforming to a different Unicode Transformation Format does not
| modify the interpretation of the text. This includes transformations
| between Unicode Transformation Formats that only differ by their
| respective byte ordering, such as a transformation from UTF-16BE
| to UTF-16LE (irrespective, whether the byte-ordering is explicitely
| specified, or is implied by the target environment the string is
| ported to).

I hope I could make my suggestion clear; improvements of my wording are
certainly possible, as I am not a native speaker of English.

Best wishes,
   Otto Stolz



This archive was generated by hypermail 2.1.2 : Thu Feb 21 2002 - 04:14:33 EST