Re: Java's version of UTF-8

From: david.batchelor@symbian.com
Date: Wed Nov 18 1998 - 04:29:25 EST

Next message: Michael Everson: "Re: OFFTOPIC: What is "francais hexagonal"?"
Previous message: Christopher JS Vance: "Re: Java's version of UTF-8"
Maybe in reply to: Doug Ewell: "Java's version of UTF-8"
Next in thread: stephen_holmes@lionbridge.com: "RE: Java's version of UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

As I understand it, Java's UTF-8 also differs from standard UTF-8 in that
surrogate-pairs are not encoded using 4 bytes, but rather that they are
encoded using 6 bytes (one group of 3 bytes for each of the pair), i.e.
Java UTF-8 treats each the two elements of surrogate pairs just as it
treats any other character whose code is greater than U+07ff.

David Batchelor

______________________________ Reply Separator _________________________________
Subject: Java's version of UTF-8
Author: <unicode@unicode.org> at symb-internet
Date: 17/11/98 22:52

I would like to know if any Java experts on the list can

(1) confirm for me that Java's version of UTF-8 differs only in
encoding U+0000 as { C0 80 } rather than { 00 }, and

(2) explain why it was necessary for Java to break the standard
to ensure that every character, EVEN THE NULL CHARACTER, be
encoded without the use of the null character.

Thanks in advance,

-Doug

text/plain attachment: RFC822.TXT

Next message: Michael Everson: "Re: OFFTOPIC: What is "francais hexagonal"?"
Previous message: Christopher JS Vance: "Re: Java's version of UTF-8"
Maybe in reply to: Doug Ewell: "Java's version of UTF-8"
Next in thread: stephen_holmes@lionbridge.com: "RE: Java's version of UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:43 EDT