From: Tex Texin (tex@i18nguy.com)
Date: Thu Feb 27 2003 - 17:19:25 EST
Ken,
Hmm, is that true? Is it ok then, if I detect an unpaired surrogate, mutter
"oops I have an error" and then drop that surrogate and continue processing
the file, resulting in a valid utf-8 file?
I thought for some reason this was prohibited, but if the standard does not
prescribe error handling, than this seems legit.
tex
Kenneth Whistler wrote:
> Absolutely. Error handling is a matter of software design, and not
> something mandated in detail by the Unicode Standard.
>
> If you write software which handles a GIF image, and there is
> a corrupted byte in the middle of a 118K GIF file, you don't go
> to the GIF specification itself, e.g.,
> http://www.w3.org/Graphics/GIF/spec-gif87.txt
> to tell your software what to do after it has processed the first
> 59K bytes (or whatever). The GIF specification just tells you
> what a well-formed GIF image is.
>
> Likewise, the Unicode Standard tells you what a well-formed
> UTF-8 byte sequence is. But it is the software designer who has
> to be smart about determining what his/her software will do when
> it encounters an error condition and finds itself dealing
> with a sequence which is ill-formed according to the specification
> of UTF-8 in the Unicode Standard.
>
> --Ken
-- ------------------------------------------------------------- Tex Texin cell: +1 781 789 1898 mailto:Tex@XenCraft.com Xen Master http://www.i18nGuy.com XenCraft http://www.XenCraft.com Making e-Business Work Around the World -------------------------------------------------------------
This archive was generated by hypermail 2.1.5 : Thu Feb 27 2003 - 18:00:29 EST