From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Feb 27 2003 - 15:53:59 EST
Frank Tang responded to Kent Karlsson's response:
> The problem I need to deal with is not GENERATE those UTF-8, but how to
> handle these DATA when my code receive it. For example, when I receive a
> 10K UTF-8 file which have 1000 lines of text, if there are one UTF-8
> sequence in the line 990 are ill-formed, should I fire the "error" for
> 1. the whole file (10K, 1000 lines),
> 2. all the line after line 899,
> 3. the line 990 itslef,
etc. etc.
>
> I there are others way you can scope the ERROR, I probably can continue
> it on and on and tell you 10-20 other way to scope it if I spend 20 more
> minutes.
>
> I do believe the error handling should be application specific.
Absolutely. Error handling is a matter of software design, and not
something mandated in detail by the Unicode Standard.
If you write software which handles a GIF image, and there is
a corrupted byte in the middle of a 118K GIF file, you don't go
to the GIF specification itself, e.g.,
http://www.w3.org/Graphics/GIF/spec-gif87.txt
to tell your software what to do after it has processed the first
59K bytes (or whatever). The GIF specification just tells you
what a well-formed GIF image is.
Likewise, the Unicode Standard tells you what a well-formed
UTF-8 byte sequence is. But it is the software designer who has
to be smart about determining what his/her software will do when
it encounters an error condition and finds itself dealing
with a sequence which is ill-formed according to the specification
of UTF-8 in the Unicode Standard.
--Ken
This archive was generated by hypermail 2.1.5 : Thu Feb 27 2003 - 16:40:03 EST