Re: UTF8 vs. Unicode (UTF16) in code

From: Yves Arrouye (yves@realnames.com)
Date: Fri Mar 09 2001 - 23:53:25 EST

Next message: Jonathan Rosenne: "RE: Final letters in Hebrew and Arabic"
Previous message: Thomas Chan: "RE: UTF8 vs. Unicode (UTF16) in code"
In reply to: Keld Jørn Simonsen: "Re: UTF8 vs. Unicode (UTF16) in code"
Next in thread: Ienup Sung: "Re: UTF8 vs. Unicode (UTF16) in code"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> > Since the U in UTF stands for Unicode, UTF-32 cannot
> represent more than
> > what Unicode encodes, which is is 1+ million code points.
> Otherwise, you're
> > talking about UCS-4. But I
> > thought that one of the latest revs of ISO 10646
> explicitely specified that
> > UCS-4 will never encode more than what Unicode can encode, and thus
> > definitely these 4 billion characters you're alluding to.
>
> As far as I know the U in UTF stands for Universal - not unicode.
> ISO 10646 can encode characters beyond UTF-16, and should retain
> this capability. There is a proposal to restrict UTF-8 to
> only encompas the same values as UTF-16, but UCS-4 still encodes
> the 31-bit code space.

Page 12 of the Unicode Standard 3.0 says:

"UTF-8 (Unicode Transformation Format-8) [...]"

which is what I used to build my knowledge of what the U stands for. But
I may be wrong.

Thanks for clarifying my confusion between the proposal for restricting
UTF-8, not UCS-4. So if the ISO never said that they will not encode
things beyond what Unicode can encode, and if UTF-8 is restricted, they
may someday need a UCSTF-8 (or whatever) to encode UCS-4, right? And the
only difference between UTF-8 and this UCSTF-8 may be the semantics of
what can be encoded and what is legal after decoding.

Next message: Jonathan Rosenne: "RE: Final letters in Hebrew and Arabic"
Previous message: Thomas Chan: "RE: UTF8 vs. Unicode (UTF16) in code"
In reply to: Keld Jørn Simonsen: "Re: UTF8 vs. Unicode (UTF16) in code"
Next in thread: Ienup Sung: "Re: UTF8 vs. Unicode (UTF16) in code"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:20 EDT