UTF-8S score keeping

From: Tex Texin (texin@progress.com)
Date: Wed Jun 13 2001 - 16:47:29 EDT


Hi,
I am losing track of the discussion, so I decided to create my
own score sheet. So far I have:

Advantage utf-8s
===================
sorts like utf-16, saving 1% CPU

allows binary compare

Only meaningful where queries do not specify an "order by" clause

Disadvantage utf-8s
=====================
Potentially there would be a utf-16s and utf-32s as well.

Utf-8s requires more space (6 bytes vs 4)

Hardware improves cpu performance 1% in a week.

utf-8s requires different counting methods for API and hence a new
API
a) since supplementary characters now count as two code units
b) since number of bytes per code unit is different

Table lookups of associated property tables require additional
step to calculate table offset (combining surrogates to get
index value) or an alternative approach to table format.
i.e. requires a mix of data lookup approaches (both UTF-8 and
UTF-16 style lookups) instead of just one or the other.

Data is most likely re-sorted linguistically for presentation to
user anyway

Adds to the number of supported encodings causing greater support
and deployment problems, additional QA costs, etc.

Adds to the number of conversion algorithms in each product.

Requires a new BOM to be defined.

Detrimental to the UTC to appear to have unstable definition of
UTF-8.

Detrimental to internet interoperability.

utf-8s validation rules may have different leniency than utf-8

Unclear if software that uses utf-8s is being mislabeled or
misrepresented as utf-8. This might force a new label to be
required for "proper" utf-8 so it can be distinguished from
the misapplied original label. This would add to confusion and the
number of utf's.

Unclear if utf-8s can remain an "internal use only" "standard".
Where are the boundaries of "internal"? Should compilers support
utf-8s? Input methods? database drivers? Are XML documents in the
database, in internal format? Dump and Load formats?

-- 
-------------------------------------------------------------------
Tex Texin                      Director, General Product Manager
mailto:Texin@Progress.com      +1-781-280-4271  Fax:+1-781-280-4655
the Progress Company           14 Oak Park, Bedford, MA 01730
-------------------------------------------------------------------



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:18 EDT