RE: UTF-16 inside UTF-8

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Dec 02 2003 - 19:58:46 EST

Next message: Peter Constable: "font embedding (was RE: MS Windows and Unicode 4.0 ?)"

Previous message: Philippe Verdy: "RE: MS Windows and Unicode 4.0 ?"
In reply to: Frank Yung-Fong Tang: "Re: UTF-16 inside UTF-8"
Next in thread: Frank Yung-Fong Tang: "RE: UTF-16 inside UTF-8"
Reply: Frank Yung-Fong Tang: "RE: UTF-16 inside UTF-8"
Maybe reply: D. Starner: "RE: UTF-16 inside UTF-8"
Maybe reply: jarkko.hietaniemi@nokia.com: "RE: UTF-16 inside UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Frank Yung-Fong Tang writes:
> But how about the UTF-16 vs UCS4 battle?

Forget it: nearly nobody uses UCS-4 except very internally for string
processing at the character level. For whole strings, nearly everybody uses
UTF-16 as it performs better with less memory costs, and because UCS-4 is
not needed.

Handling surrogates found in surrogates is quite simple and in fact it is
even simpler to detect and manage than handling MBCS-encoded strings for
Asian 8-bit applications, and today MBCS 8-bit processing is performed by
transforming it first into equivalent internal 16-bit code positions, or
sometimes by transcoding it to Unicode with UTF-16.

So I do think that applications that could handle East-Asian DBCS 8-bit text
(EUC-*, ISO2022-*, JIS) can very easily be modified to work internally with
UTF-16 (notably because interoperability of Unicode code points with these
DBCS charsets is excellent as the transcoding is not ambiguous, bijective,
does not need code reordering, and just consists in a simple mapping table
implemented now in all OSes localized for Asian markets).

East-Asian developers have learned since long how to cope with DBCS-encoded
strings. Now with UTF-16, handling surrogates found in string is even
simpler, as UTF-16 allows bidirectional and random access to any positions
in strings, which means additional performance and less tricky algorithms
for text processing...

__________________________________________________________________
<< ella for Spam Control >> has removed Spam messages and set aside
Newsletters for me
You can use it too - and it's FREE! http://www.ellaforspam.com

application/ms-tnef attachment: winmail.dat

Next message: Peter Constable: "font embedding (was RE: MS Windows and Unicode 4.0 ?)"
Previous message: Philippe Verdy: "RE: MS Windows and Unicode 4.0 ?"
In reply to: Frank Yung-Fong Tang: "Re: UTF-16 inside UTF-8"
Next in thread: Frank Yung-Fong Tang: "RE: UTF-16 inside UTF-8"
Reply: Frank Yung-Fong Tang: "RE: UTF-16 inside UTF-8"
Maybe reply: D. Starner: "RE: UTF-16 inside UTF-8"
Maybe reply: jarkko.hietaniemi@nokia.com: "RE: UTF-16 inside UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Dec 02 2003 - 20:51:28 EST