Re: UTF-8, U+0000 and JDK

From: Markus Kuhn (Markus.Kuhn@cl.cam.ac.uk)
Date: Sun Sep 26 1999 - 16:02:13 EDT

Next message: Karl Pentzlin: "Re: UTF-8, U+0000 and Software Development (was: Re: New UTF-8 decoder stress test file)"
Previous message: Glen Perkins: "Re: New UTF-8 decoder stress test file"
In reply to: Valeriy E. Ushakov: "Re: New UTF-8 decoder stress test file"
Next in thread: Glen Perkins: "Re: New UTF-8 decoder stress test file"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

"Valeriy E. Ushakov" wrote on 1999-09-26 17:10 UTC:
> > U+0000 = c0 80
>
> I belive that's exactly what JDK uses to encode U+0000 in utf-8
> encoded NUL terminated C strings to distinguish U+0000 which is part
> of a string from the terminating NUL.

It probably would help to avoid confusion, if the Java documentation
introduced a new name for this encoding. Good and clear terminology is
never a bad thing.

Suggestion:

UTF-8Z = zero-free UTF-8 encoding, which differs from
UTF-8 only for one character, namely U+0000 = c0 80

But then, Java uses UTF-8Z only as an internal encoding, and not in its
UTF-8 I/O functions.

I think, is was a curious design decision:

I probably would have selected U+0000 = fe. This is as malformed as
c0 80, but has the big advantage that UTF-8 and UTF-Z would then always
have had the same length. Note that fe and ff are unused in UTF-8.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

Next message: Karl Pentzlin: "Re: UTF-8, U+0000 and Software Development (was: Re: New UTF-8 decoder stress test file)"
Previous message: Glen Perkins: "Re: New UTF-8 decoder stress test file"
In reply to: Valeriy E. Ushakov: "Re: New UTF-8 decoder stress test file"
Next in thread: Glen Perkins: "Re: New UTF-8 decoder stress test file"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:53 EDT