From: kenw@sybase.com (Kenneth Whistler) Reply-To: unicore@unicode.org To: Multiple Recipients of Unicore Cc: kenw@sybase.com, ietf-charsets@iana.org Date: Fri, 24 Jul 1998 14:05:14 -0700 (PDT) Subject: Re: Charset reviewer appointed With regards to Harald Alvestrand's summary of the open issues with respect to the UTF-16 registration, the only way I see forward, given the nature of the "charset" definition, is to split this request into two registrations: UTF-16 big-endian UTF-16 UTF-16BS little-endian (byte-swapped) UTF-16 This would finesse the whole, irritating business of the position and requirement for the BOM in string-handling protocols. The emitter of data in one or the other of the two "charsets" would have to guarantee the byte order of the data it purports to emit. And the BOM would revert again to what it is supposed to be: a handy signature which *may* be included in text for those instances in which an interpreter *may* be getting data of either polarity in a mixed platform environment. I don't like the garden path people have been starting down of requiring that a BOM *must* be attached to every piece of little-endian UTF-16 text, no matter what. That is, in my opinion, trying to turn the BOM into something the functional equivalent of a escape sequence for identifying a character set in the context of ISO 2022 -- it just becomes a metacharacter for identifying the "charset". Why not just bite the bullet and identify the "charset" unambiguously from the start? --Ken Whistler ----- Begin Included Message ----- > > Date: Sun, 21 Jun 1998 07:07:12 +0200 > > From: Harald Tveit Alvestrand > > To: ietf-charsets@iana.org > > > WRT outstanding registrations, my opinion at the moment is: > > ... > > > > - UTF-16 is controversial because of the BOM and byte-order issues. > > I think consensus has not been achieved; the significant objections > > are: > > > > - While there is consensus that big-endian is preferred, there is > > not consensus if little-endian is acceptable. > > - While there is consensus that little-endian, if allowed, MUST > > include the BOM, there is no consensus on where, if ever, a BOM > > must be inserted in big-endian encoded text. > > - There is no consensus that it is possible to write sensible rules > > about using the BOM in protocols that carry multiple independent > > pieces of text. > > > > This registration will wait a bit yet. > ----- End Included Message -----