From: Kenneth Whistler (kenw@sybase.com)
Date: Fri May 16 2003 - 15:09:46 EDT
Stefan Persson asked:
> Peter underscore Constable at sil dot org wrote:
>
> > These might be considered encoding forms, and they might be able to encode
> > the Unicode coded character set, but I don't think these should be called
> > "Unicode encoding forms". There are exactly three Unicode encoding forms:
> > UTF-8, UTF-16 and UTF-32.
>
> Are not BE and LE regarded as different encoding forms, making five
> encoding forms (UTF-8, UTF-16BE, UTF-16LE, UTF-32BE & UTF-32LE)?
No.
The Unicode Standard has:
One (1) coded character set (CCS).
Three (3) encoding forms (CEF): UTF-8, UTF-16, UTF-32.
Seven (7) encoding schemes (CES): UTF-8
UTF-16, UTF-16BE, UTF-16LE
UTF-32, UTF-32BE, UTF-32LE
All the particulars are laid out publicly in excruciating
detail at:
http://www.unicode.org/book/preview/ch03.pdf
People on this list should make a particular effort to familiarize
themselves in Section 3.9 Unicode Encoding Forms and Section
3.10 Unicode Encoding Schemes, before making claims about them.
Any old explanations, including the text of The Unicode Standard,
Version 3.0, have now been superseded by The Unicode Standard,
Version 4.0 -- and that is why the editors put Chapter 3 up
on the web for people to refer to.
--Ken
This archive was generated by hypermail 2.1.5 : Fri May 16 2003 - 15:51:21 EDT