Re: Unicode forms for internal storage - BOCU-1 speed

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu Jan 22 2004 - 16:48:50 EST

  • Next message: Mike Ayers: "RE: Unicode forms for internal storage - BOCU-1 speed"

    From: <jcowan@reutershealth.com>
    To: "Philippe Verdy" <verdy_p@wanadoo.fr>
    Cc: "Markus Scherer" <markus.scherer@jtcsv.com>; <unicode@unicode.org>
    Sent: Thursday, January 22, 2004 10:26 PM
    Subject: Re: Unicode forms for internal storage - BOCU-1 speed

    > Philippe Verdy scripsit:
    >
    > > Is the other competing UTF-9 from Jerome Abela this one:
    >
    > No. Abela's version preserves all of 00-7F and A0-FF, packing all the rest
    > of Unicode into sequences beginning with any of 80-9F.

    Thanks for pointing this.

    By the way, I don't think that there's an official reference that attributes
    the acronym "UTF-9" to any of these encoding forms. I think that if "UTF-9"
    is used it should be agreed by Unicode as being an official unique
    representation. The other forms requiring another encoding label not
    starting by "UTF" which should be reserved to encoding forms approved by
    Unicode and ISO/IEC 10646.

    We have already suffered in the past of the confusion caused by various
    interpretation of "UTF-8" (until CESU-8 was documented, and the acronym
    "UTF-8" removed from the JNI documentation for Java) and by confusions
    between UTF-16/UTF-16BE/UTF-16LE/UCS2... I think then that "UTF-9" is a bad
    acronym to refer to a specific unapproved (not-standard) encoding form, and
    its use in this mailing list is just adding more confusion because there's
    no such "UTF-9" standard until it is documented by a IETF/ISO/IEC 10646 RFC
    or by Unicode.



    This archive was generated by hypermail 2.1.5 : Thu Jan 22 2004 - 17:34:17 EST