conversion performance: UTF-8 BOCU-1 SCSU

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Thu Apr 04 2002 - 19:50:17 EST


I have numbers for text size and conversion performance of BOCU-1 and SCSU relative to UTF-8.

Quick summary:

For Latin text, UTF-8 is best.
For CJK, BOCU-1 and SCSU provide smaller size, with some speed trade-off.
For other scripts, BOCU-1 and SCSU are much better than UTF-8 in both speed and size.

Note that BOCU-1 encoded text (since it preserves control characters and spaces) could be directly used in emails, for CVS, etc.

Please see http://oss.software.ibm.com/icu/dropbox/bocuperf.html

Best regards,
markus



This archive was generated by hypermail 2.1.2 : Thu Apr 04 2002 - 20:33:26 EST