From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Dec 02 2003 - 19:58:46 EST
Frank Yung-Fong Tang writes:
> But how about the UTF-16 vs UCS4 battle?
Forget it: nearly nobody uses UCS-4 except very internally for string
processing at the character level. For whole strings, nearly everybody uses
UTF-16 as it performs better with less memory costs, and because UCS-4 is
not needed.
Handling surrogates found in surrogates is quite simple and in fact it is
even simpler to detect and manage than handling MBCS-encoded strings for
Asian 8-bit applications, and today MBCS 8-bit processing is performed by
transforming it first into equivalent internal 16-bit code positions, or
sometimes by transcoding it to Unicode with UTF-16.
So I do think that applications that could handle East-Asian DBCS 8-bit text
(EUC-*, ISO2022-*, JIS) can very easily be modified to work internally with
UTF-16 (notably because interoperability of Unicode code points with these
DBCS charsets is excellent as the transcoding is not ambiguous, bijective,
does not need code reordering, and just consists in a simple mapping table
implemented now in all OSes localized for Asian markets).
East-Asian developers have learned since long how to cope with DBCS-encoded
strings. Now with UTF-16, handling surrogates found in string is even
simpler, as UTF-16 allows bidirectional and random access to any positions
in strings, which means additional performance and less tricky algorithms
for text processing...
__________________________________________________________________
<< ella for Spam Control >> has removed Spam messages and set aside
Newsletters for me
You can use it too - and it's FREE! http://www.ellaforspam.com
This archive was generated by hypermail 2.1.5 : Tue Dec 02 2003 - 20:51:28 EST