re:UTF-8 as character set (was: Java and UTF)

From: Pierre Lewis (lew@nortel.ca)
Date: Fri Jul 04 1997 - 09:09:00 EDT


In message "UTF-8 as character set (was: Java and UTF)",
'kenw@sybase.com' writes:

> > Still a bit strange to find UTF-8 (a transform, ie. an algorithm)
> > besides MacThai (an encoding, ie. a table). But, semantic subtleties
> > aside, it's there.
>
> This speaks to a subtle distinction which is not always being
> made.
> ... (explanation thereof deleted)

Thanks for the very useful clarifications (including on terminology).
I knew about these new Java classes that allowed to convert between
various encodings (eg. CP850 <--> Unicode), but, because I had an
algorithmic view of what UTF-8 meant, it never occured to me to
search in there (and since the book doesn't have the table of
supported encodings, it didn't jump me in the face either).

Anyway, now I have more answers than I asked for.

Still, one more question. What exception would InputStreamReader
throw on getting non-standard (eg. language-tagged :-) ) UTF-8?
UTFDataFormatException? My book associates this exception only
with DataInput I/F. Another source of my confusion.

Thanks to all,
Pierre



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT