Re: Subject: Re: 32'nd bit & UTF-8

From: Christopher Fynn (cfynn@gmx.net)
Date: Thu Jan 20 2005 - 07:00:16 CST

Next message: gpw@uniserve.com: "Re: UTF-8 'BOM'"

Previous message: Hans Aberg: "Re: 32'nd bit & UTF-8"
In reply to: Hans Aberg: "Re: Subject: Re: 32'nd bit & UTF-8"
Next in thread: Hans Aberg: "Re: Subject: Re: 32'nd bit & UTF-8"
Reply: Hans Aberg: "Re: Subject: Re: 32'nd bit & UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Hans Aberg wrote:
> On 2005/01/20 02:28, Christopher Fynn at cfynn@gmx.net wrote:

>>>Whereas UTF-16 might have been used widely in some quarters up today, my
>>>impression is that this is more of a legacy thing, and UTF-8 and UTF-32 will
>>>eventually become the only modern formats in use. In the past, one
>>>originally used 16-bits integral types because one thought Unicode would not
>>>exceed 2^16 numbers. But when it is clear it does not suffice, there is no
>>>point using it in new software, except for legacy. UTF-32 will be used for
>>>speed, and UTF-8 for compatibility with ASCII and solving the endian issue.

>>If you choose Save as "Unicode" in MS applications what do you get? The
>>"legacy" of all that data being created today in MS Office etc on Windows
>>machines is going to be around for awhile.

> One can do as in the C++ standard with its .h headers, decide to keep UTF-16
> for now as legacy, but indicate that it may be phased out in a later Unicode
> version. Developers then get X numbers of years to change. It will be easy
> to make new editors read the old formats but save them in the new formats.
>
> Hans Aberg

Something like 99% of text data uses only BMP characters for which UTF-16
is pretty efficient. Unless new scripts are adopted for modern languages, we
all start using Egyptian Hieroglyphics or China creates thousands of new
ideographic characters and makes their use mandatory in place of existing
characters, this situation seems unlikely to change.

Didn't MS natively support Unicode (/UCS-2) with the first version of
Windows NT - before UTF-8 came along - and chose a 16-bit form because
that's was what Unicode was at the time NT was developed?

Doesn't MAC OSX use UTF-16 for most of it's native APIs - except for stuff
that calls BSD system routines?

- Chris

Next message: gpw@uniserve.com: "Re: UTF-8 'BOM'"
Previous message: Hans Aberg: "Re: 32'nd bit & UTF-8"
In reply to: Hans Aberg: "Re: Subject: Re: 32'nd bit & UTF-8"
Next in thread: Hans Aberg: "Re: Subject: Re: 32'nd bit & UTF-8"
Reply: Hans Aberg: "Re: Subject: Re: 32'nd bit & UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Jan 20 2005 - 07:01:12 CST