Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)

From: Tex Texin (tex@i18nguy.com)
Date: Thu Feb 27 2003 - 17:19:25 EST

Next message: Kenneth Whistler: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"

Previous message: Yung-Fong Tang: "quoted-string in for MIME Content-Type charset parameter"
In reply to: Kenneth Whistler: "UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Next in thread: Kenneth Whistler: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Maybe reply: Kenneth Whistler: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Ken,
Hmm, is that true? Is it ok then, if I detect an unpaired surrogate, mutter
"oops I have an error" and then drop that surrogate and continue processing
the file, resulting in a valid utf-8 file?

I thought for some reason this was prohibited, but if the standard does not
prescribe error handling, than this seems legit.

tex

Kenneth Whistler wrote:
> Absolutely. Error handling is a matter of software design, and not
> something mandated in detail by the Unicode Standard.
>
> If you write software which handles a GIF image, and there is
> a corrupted byte in the middle of a 118K GIF file, you don't go
> to the GIF specification itself, e.g.,
> http://www.w3.org/Graphics/GIF/spec-gif87.txt
> to tell your software what to do after it has processed the first
> 59K bytes (or whatever). The GIF specification just tells you
> what a well-formed GIF image is.
>
> Likewise, the Unicode Standard tells you what a well-formed
> UTF-8 byte sequence is. But it is the software designer who has
> to be smart about determining what his/her software will do when
> it encounters an error condition and finds itself dealing
> with a sequence which is ill-formed according to the specification
> of UTF-8 in the Unicode Standard.
>
> --Ken

-- 
-------------------------------------------------------------
Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
Xen Master                          http://www.i18nGuy.com
                         
XenCraft		            http://www.XenCraft.com
Making e-Business Work Around the World
-------------------------------------------------------------

Next message: Kenneth Whistler: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Previous message: Yung-Fong Tang: "quoted-string in for MIME Content-Type charset parameter"
In reply to: Kenneth Whistler: "UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Next in thread: Kenneth Whistler: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Maybe reply: Kenneth Whistler: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Feb 27 2003 - 18:00:29 EST