From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Thu Feb 13 2003 - 12:55:14 EST
UTF-16, UTF-16BE, and UTF-16LE are charset names that are registered with the IANA. See
http://www.iana.org/assignments/character-sets
They are formally defined in RFC 2781 (e.g. ftp://ftp.rfc-editor.org/in-notes/rfc2781.txt)
UTF-32* are defined in UAX #19, as Doug wrote, and are also IANA-registered charset names.
markus
Doug Ewell wrote:
> Jungshik Shin <jshin at mailaps dot org> wrote:
>
>>>Note that "UTF-16 little-endian" is not technically the
>>>same as "UTF-16LE"; the former implies the presence of a BOM while
>>>the latter implies that none is present.)
>>
>> Where does this distinction come from?
>
> The sources I checked were UTR #17, "Character Encoding Model," and UAX
> #19, "UTF-32." The latter does not specifically talk about UTF-16BE or
> UTF-16LE, but uses the same definitions to distinguish UTF-32, UTF-32BE,
> and UTF-32LE that we are using here.
>
> Mark Davis can probably point you to other sources as well.
This archive was generated by hypermail 2.1.5 : Thu Feb 13 2003 - 13:32:49 EST