From: Jungshik Shin (jshin@mailaps.org)
Date: Thu Feb 13 2003 - 00:27:34 EST
On Thu, 13 Feb 2003, Zhang Weiwu wrote:
> Very newbie question:
> 1) I noticed when I save a file as "unicode" in Windows 2000, or
> other editor like EditPlus, the file begins with FF FE, which looks
> like UTF16LE. Also it seems to me when ContentType in a html page is
> "unicode", IE tends to understand it as UTF16LE. So it seems UTF16LE is
> (or was) the standard coding for unicode.
What Windows or IE does not make anything more standard-compliant
than it actually is. For Windows and MS IE running on
intel x86 machines, it may be pretty natural to use UTF-16LE,
but that does not hold for other architecture/OS combinations.
> 2) But on the FAQ on unicode.org, it says UTF16BE is the prefered
> unicode coding.
>
> Is it that, when people say "unicode" without UTF, they mean UTF16LE?
No, UTF-16LE is just one of many Unicode transformation form(at)s.
Each UTF has its own pros and cons and you have to choose
whatever is appropriate for your own need.
> I am going to design a website with unicode. I don't use UTF-8 because
> most are CJK text thus UTF-8 html would be too fat. I should use UTF16LE,
> should I?
Whatever UTF youdecide to use, the only thing you have to take care
of is to label/mark it in a standard compliant-way. If you want to
use UTF-16LE, you should make sure that your web server
emits the correct http header with C-T as following:
(note that meta tag in the beg. of html files
don't work well for UTF-16/UTF-32)
Content-Type: text/html; charset=UTF-16LE
On top of that, you may wish to put BOM at teh very beg. of
your UTF-16LE html files although that's not necessary
with the correct C-T http header as above.
BTW, you MUST NOT use 'charset=unicode' assuming that it'll be
interpreted as 'utf-16le'. See http://www.i18nguy.com/unicode
and http://jshin.net/i18n/utftest
Jungshik
This archive was generated by hypermail 2.1.5 : Thu Feb 13 2003 - 01:05:23 EST