> > Since the U in UTF stands for Unicode, UTF-32 cannot
> represent more than
> > what Unicode encodes, which is is 1+ million code points.
> Otherwise, you're
> > talking about UCS-4. But I
> > thought that one of the latest revs of ISO 10646
> explicitely specified that
> > UCS-4 will never encode more than what Unicode can encode, and thus
> > definitely these 4 billion characters you're alluding to.
>
> As far as I know the U in UTF stands for Universal - not unicode.
> ISO 10646 can encode characters beyond UTF-16, and should retain
> this capability. There is a proposal to restrict UTF-8 to
> only encompas the same values as UTF-16, but UCS-4 still encodes
> the 31-bit code space.
Page 12 of the Unicode Standard 3.0 says:
"UTF-8 (Unicode Transformation Format-8) [...]"
which is what I used to build my knowledge of what the U stands for. But
I may be wrong.
Thanks for clarifying my confusion between the proposal for restricting
UTF-8, not UCS-4. So if the ISO never said that they will not encode
things beyond what Unicode can encode, and if UTF-8 is restricted, they
may someday need a UCSTF-8 (or whatever) to encode UCS-4, right? And the
only difference between UTF-8 and this UCSTF-8 may be the semantics of
what can be encoded and what is legal after decoding.
YA
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:20 EDT