Re: Stateful?

From: Marcin ‘Qrczak’ Kowalczyk (qrczak@knm.org.pl)
Date: Wed May 28 2008 - 05:11:19 CDT

Next message: Behnam: "Re: Arabic Lamalef missing Unicode Ligatures with Tashkeel and/or Shadda on Lam"

Previous message: Bob_Hallissy@sil.org: "Re: Arabic Lamalef missing Unicode Ligatures with Tashkeel and/or Shadda on Lam"
In reply to: Kenneth Whistler: "RE: Stateful?"
Next in thread: Doug Ewell: "Re: Stateful?"
Reply: Doug Ewell: "Re: Stateful?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

2008/5/28 Kenneth Whistler <kenw@sybase.com>:

>> UTF-16, after all, is stateful: if you lose the BOM,
>> things can look very different.
>
> That is true of the UTF-16 encoding *scheme*. (See TUS 5.0,
> D98, p. 106.) That is because in the UTF-16 encoding scheme,
> an initial BOM is itself a stateful switch for byte order.
> UTF-16BE and UTF-16LE, on the other hand are not stateful.

It is a pity that UTF-8 is somewhat ambiguous over whether it is
stateful. A UTF-8 with a BOM is stateful: the decoder must remember
whether it has seen a BOM or whether it is past the beginning, and the
encoder must remember if it is at the beginning, to know whether to
emit U+FEFF twice for the case when the data begins with U+FEFF. A
UTF-8 without any special treatment of U+FEFF at the beginning is
stateless. Both variants of UTF-8 are in use. It would be better to
distinguish them explicitly, like UTF-16 is distinguished from
UTF-16BE & UTF-16LE.

-- 
Marcin Kowalczyk
qrczak@knm.org.pl
http://qrnik.knm.org.pl/~qrczak/

Next message: Behnam: "Re: Arabic Lamalef missing Unicode Ligatures with Tashkeel and/or Shadda on Lam"
Previous message: Bob_Hallissy@sil.org: "Re: Arabic Lamalef missing Unicode Ligatures with Tashkeel and/or Shadda on Lam"
In reply to: Kenneth Whistler: "RE: Stateful?"
Next in thread: Doug Ewell: "Re: Stateful?"
Reply: Doug Ewell: "Re: Stateful?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed May 28 2008 - 05:14:02 CDT