From: Peter_Constable@sil.org
Date: Fri May 16 2003 - 15:48:51 EDT
Stefan Persson wrote on 05/16/2003 01:24:35 PM:
> > These might be considered encoding forms, and they might be able to
encode
> > the Unicode coded character set, but I don't think these should be
called
> > "Unicode encoding forms". There are exactly three Unicode encoding
forms:
> > UTF-8, UTF-16 and UTF-32.
>
> Are not BE and LE regarded as different encoding forms, making five
> encoding forms (UTF-8, UTF-16BE, UTF-16LE, UTF-32BE & UTF-32LE)?
No, you are thinking of character encoding *schemes*, of which there are
seven: add to your list "UTF-16" and "UTF-32".
I'll echo Addison's recommendation: read UTR#17 to explain the differences
between the five levels of Unicode's character encoding model:
abstract character repertoire
coded character set
character encoding form
character encoding scheme
transfer encoding syntax
People might also look at Chapter 3 of TUS4.0, the final draft of which is
online at http://www.unicode.org/book/preview/ch03.pdf. In particular,
"encoding form" is defined as D29, "encoding scheme" is defined as D38, and
the specific encoding forms and schemes *defined by Unicode* (take note,
Philippe) are defined in the surrounding pages.
- Peter
---------------------------------------------------------------------------
Peter Constable
Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
This archive was generated by hypermail 2.1.5 : Fri May 16 2003 - 16:37:27 EDT