Re: UCS-4, UCS-2, UTF-16, UTF-8

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Feb 16 2000 - 21:48:06 EST


Ohmson commented:

> One of the big
> debates that we get into is whether we should encode
> the data in the database in the various format
> shown in the subject.

Well, if you are writing a *new* client/server prototype, you
should immediately scratch UCS-2 from your list. The reason
for that is that while with Unicode 3.0 you could still get
by without support for UTF-16, in the very near future (perhaps
by the end of 2000, or early 2001 at the latest), the encoding of
the next big chunk of characters will be complete enough to allow
early implementers to get started. And that next big chunk of
characters will quite likely contain a significant number of
Chinese and Japanese characters for which there will be a large
pressure to implement immediately (characters required by the
Japanese and HKSAR governments, for example).

The exact encoding of those characters is not nailed down yet,
but their fate is on the agenda of the WG2 meeting next month
in Beijing -- and it is still quite likely that they will end up
on Plane 2 in 10646-2, hence requiring use of surrogate codes in
Unicode.

So anyone working on new systems now should be aiming either at
UTF-16, UTF-8, or UTF-32 (i.e. UCS-4 limited to planes 0..16).
See Unicode Technical Report #19, UTF-32:

http://www.unicode.org/unicode/reports/tr19/

for more information).

--Ken



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT