Re: newbie: unicode (when used as a coding) = UTF16LE?

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Thu Feb 13 2003 - 12:55:14 EST

  • Next message: Paul Hastings: "Re: traditional vs simplified chinese"

    UTF-16, UTF-16BE, and UTF-16LE are charset names that are registered with the IANA. See
    http://www.iana.org/assignments/character-sets

    They are formally defined in RFC 2781 (e.g. ftp://ftp.rfc-editor.org/in-notes/rfc2781.txt)

    UTF-32* are defined in UAX #19, as Doug wrote, and are also IANA-registered charset names.

    markus

    Doug Ewell wrote:
    > Jungshik Shin <jshin at mailaps dot org> wrote:
    >
    >>>Note that "UTF-16 little-endian" is not technically the
    >>>same as "UTF-16LE"; the former implies the presence of a BOM while
    >>>the latter implies that none is present.)
    >>
    >> Where does this distinction come from?
    >
    > The sources I checked were UTR #17, "Character Encoding Model," and UAX
    > #19, "UTF-32." The latter does not specifically talk about UTF-16BE or
    > UTF-16LE, but uses the same definitions to distinguish UTF-32, UTF-32BE,
    > and UTF-32LE that we are using here.
    >
    > Mark Davis can probably point you to other sources as well.



    This archive was generated by hypermail 2.1.5 : Thu Feb 13 2003 - 13:32:49 EST