Re: newbie: unicode (when used as a coding) = UTF16LE?

From: starner@okstate.edu
Date: Thu Feb 13 2003 - 01:02:31 EST

  • Next message: Doug Ewell: "Re: newbie: unicode (when used as a coding) = UTF16LE?"

    >Also it seems to me when ContentType in a html page is "unicode", IE tends to understand it as UTF16LE. So it seems UTF16LE is (or was) the standard coding for unicode.

    Just because IE does something doesn't mean it's the standard. The whole
    world doesn't run IE. The legal content types are listed here:
    <http://www.iana.org/assignments/character-sets>; in practice, the vast
    majority of those shouldn't be used. Unicode is not a legal context type.
    UTF-16BE, UTF-16LE or UTF-16 (all as specified in RFC2781
    <ftp://ftp.rfc-editor.org/in-notes/rfc2781.txt>) are the acceptable
    names for UTF-16 content; UTF-8 is also legal, and usable.(Sadly, many
    of the other names in the file are ill-defined and/or useless. Of the
    other Unicode names, UTF-7, SCSU, BOCU-1 and UTF-32* are useful in
    limited contexts; the rest you should pretend don't exist. (csUnicode
    exists, but is UCS2-BE, and shouldn't be used.))

    >Is it that, when people say "unicode" without UTF, they mean UTF16LE?

    If people just say "unicode", you can't assume any encoding form. If a
    Unix guy says "unicode", he's probably thinking UTF-8 or UTF-32. If you
    mean an encoding, include one; if they don't include one, ask.

    >I am going to design a website with unicode. I don't use UTF-8 because most are CJK text thus UTF-8 html would be too fat. I should use UTF16LE, should I?

    UTF-16LE - so labeled, hyphen and all - is a perfectly acceptable encoding,
    as would be UTF-16BE. It's probably irrelevant which you use.



    This archive was generated by hypermail 2.1.5 : Thu Feb 13 2003 - 01:39:23 EST