UCS-4, UCS-2, UTF-16, UTF-8
From: ohmson ohmson (ohmson@netscape.net)
Date: Wed Feb 16 2000 - 19:16:42 EST
- Next message: Paul J. Lewis: "Glyph rendering?"
- Previous message: Kenneth Whistler: "RE: Vulgar fractions (was: 8859-1, 8859-15, 1252 and Euro)"
- Next in thread: Kenneth Whistler: "Re: UCS-4, UCS-2, UTF-16, UTF-8"
- Maybe reply: Kenneth Whistler: "Re: UCS-4, UCS-2, UTF-16, UTF-8"
- Maybe reply: Joerg Knappen: "Re: UCS-4, UCS-2, UTF-16, UTF-8"
- Maybe reply: Mark Davis: "Re: UCS-4, UCS-2, UTF-16, UTF-8"
- Maybe reply: Doug Ewell: "Re: UCS-4, UCS-2, UTF-16, UTF-8"
- Maybe reply: Yung-Fong Tang: "Re: UCS-4, UCS-2, UTF-16, UTF-8"
- Maybe reply: Robert A. Rosenberg: "Re: UCS-4, UCS-2, UTF-16, UTF-8"
- Maybe reply: Kenneth Whistler: "Re: UCS-4, UCS-2, UTF-16, UTF-8"
- Maybe reply: Doug Ewell: "Re: UCS-4, UCS-2, UTF-16, UTF-8"
- Maybe reply: Markus Kuhn: "Re: UCS-4, UCS-2, UTF-16, UTF-8"
- Maybe reply: G. Adam Stanislav: "Re: UCS-4, UCS-2, UTF-16, UTF-8"
- Maybe reply: Dan Oscarsson: "Re: UCS-4, UCS-2, UTF-16, UTF-8"
- Maybe reply: Markus Kuhn: "Re: UCS-4, UCS-2, UTF-16, UTF-8"
- Maybe reply: Doug Ewell: "Re: UCS-4, UCS-2, UTF-16, UTF-8"
- Maybe reply: G. Adam Stanislav: "Re: UCS-4, UCS-2, UTF-16, UTF-8"
- Messages sorted by:
[ date ]
[ thread ]
[ subject ]
[ author ]
[ attachment ]
- Mail actions: [ respond to this message ] [ mail a new topic ]
Hi Folks,
I have been lurking behind the mailing list for a little
while and have learnt great stuff from this list. I have visited
the unicode.org site and read most of the stuff from there
(we are waiting for the UNICODE 3.0 book to arrive, anyday
now). I also followed Markus Kuhn's faq on writing unicode-
enabled applications on UNIX (hence the UTF-8 bias).
Our team has gotten ready to write a client/server
prototype that is going to be I18N. One of the big
debates that we get into is whether we should encode
the data in the database in the various format
shown in the subject. I started by listing some obvious
pros and cons and would very much appreciate what you
folks with the necessary development experience think
of it. To give it more perspective, we are using C++
as the programming language.
UCS-4
pros:
- no conversion from UNICODE code points to representation,
easiest for programming
cons:
- major storage wastage as only about ~1million code points
are defined and furthermore, ~65k are of significant interest.
UCS-2
pros:
- no conversion from UNICODE code points to representation,
easiest for programming
- native to Win NT
cons:
- missing out code points beyond the BMP
UTF-16
pros:
- all code points are encoded
- native to Win2000
- mostly 2 bytes for most natural languages
cons:
- need conversion algorithm
UTF-8
pros:
- all code points are encoded
- native to UNIX
- friendly to sockets programming
cons:
- need conversion algorithm
I won't go into the storage of UTF-16/UTF-8 cause i think it
depends on the language (CJK requires 2 bytes in former but
3 bytes in latter).
Thx much, ohmson
____________________________________________________________________
Get your own FREE, personal Netscape WebMail account today at http://webmail.netscape.com.
- Next message: Paul J. Lewis: "Glyph rendering?"
- Previous message: Kenneth Whistler: "RE: Vulgar fractions (was: 8859-1, 8859-15, 1252 and Euro)"
- Next in thread: Kenneth Whistler: "Re: UCS-4, UCS-2, UTF-16, UTF-8"
- Maybe reply: Kenneth Whistler: "Re: UCS-4, UCS-2, UTF-16, UTF-8"
- Maybe reply: Joerg Knappen: "Re: UCS-4, UCS-2, UTF-16, UTF-8"
- Maybe reply: Mark Davis: "Re: UCS-4, UCS-2, UTF-16, UTF-8"
- Maybe reply: Doug Ewell: "Re: UCS-4, UCS-2, UTF-16, UTF-8"
- Maybe reply: Yung-Fong Tang: "Re: UCS-4, UCS-2, UTF-16, UTF-8"
- Maybe reply: Robert A. Rosenberg: "Re: UCS-4, UCS-2, UTF-16, UTF-8"
- Maybe reply: Kenneth Whistler: "Re: UCS-4, UCS-2, UTF-16, UTF-8"
- Maybe reply: Doug Ewell: "Re: UCS-4, UCS-2, UTF-16, UTF-8"
- Maybe reply: Markus Kuhn: "Re: UCS-4, UCS-2, UTF-16, UTF-8"
- Maybe reply: G. Adam Stanislav: "Re: UCS-4, UCS-2, UTF-16, UTF-8"
- Maybe reply: Dan Oscarsson: "Re: UCS-4, UCS-2, UTF-16, UTF-8"
- Maybe reply: Markus Kuhn: "Re: UCS-4, UCS-2, UTF-16, UTF-8"
- Maybe reply: Doug Ewell: "Re: UCS-4, UCS-2, UTF-16, UTF-8"
- Maybe reply: G. Adam Stanislav: "Re: UCS-4, UCS-2, UTF-16, UTF-8"
- Messages sorted by:
[ date ]
[ thread ]
[ subject ]
[ author ]
[ attachment ]
- Mail actions: [ respond to this message ] [ mail a new topic ]
This archive was generated by hypermail 2.1.2
: Tue Jul 10 2001 - 17:20:59 EDT