From: starner@okstate.edu
Date: Thu Feb 13 2003 - 01:02:31 EST
>Also it seems to me when ContentType in a html page is "unicode", IE tends to understand it as UTF16LE. So it seems UTF16LE is (or was) the standard coding for unicode.
Just because IE does something doesn't mean it's the standard. The whole
world doesn't run IE. The legal content types are listed here:
<http://www.iana.org/assignments/character-sets>; in practice, the vast
majority of those shouldn't be used. Unicode is not a legal context type.
UTF-16BE, UTF-16LE or UTF-16 (all as specified in RFC2781
<ftp://ftp.rfc-editor.org/in-notes/rfc2781.txt>) are the acceptable
names for UTF-16 content; UTF-8 is also legal, and usable.(Sadly, many
of the other names in the file are ill-defined and/or useless. Of the
other Unicode names, UTF-7, SCSU, BOCU-1 and UTF-32* are useful in
limited contexts; the rest you should pretend don't exist. (csUnicode
exists, but is UCS2-BE, and shouldn't be used.))
>Is it that, when people say "unicode" without UTF, they mean UTF16LE?
If people just say "unicode", you can't assume any encoding form. If a
Unix guy says "unicode", he's probably thinking UTF-8 or UTF-32. If you
mean an encoding, include one; if they don't include one, ask.
>I am going to design a website with unicode. I don't use UTF-8 because most are CJK text thus UTF-8 html would be too fat. I should use UTF16LE, should I?
UTF-16LE - so labeled, hyphen and all - is a perfectly acceptable encoding,
as would be UTF-16BE. It's probably irrelevant which you use.
This archive was generated by hypermail 2.1.5 : Thu Feb 13 2003 - 01:39:23 EST