From: Kenneth Whistler (
Date: Thu Feb 27 2003 - 15:53:59 EST
Frank Tang responded to Kent Karlsson's response:
> The problem I need to deal with is not GENERATE those UTF-8, but how to
> handle these DATA when my code receive it. For example, when I receive a
> 10K UTF-8 file which have 1000 lines of text, if there are one UTF-8
> sequence in the line 990 are ill-formed, should I fire the "error" for
> 1. the whole file (10K, 1000 lines),
> 2. all the line after line 899,
> 3. the line 990 itslef,
etc. etc.
> I there are others way you can scope the ERROR, I probably can continue
> it on and on and tell you 10-20 other way to scope it if I spend 20 more
> minutes.
> I do believe the error handling should be application specific.
Absolutely. Error handling is a matter of software design, and not
something mandated in detail by the Unicode Standard.
If you write software which handles a GIF image, and there is
a corrupted byte in the middle of a 118K GIF file, you don't go
to the GIF specification itself, e.g.,
to tell your software what to do after it has processed the first
59K bytes (or whatever). The GIF specification just tells you
what a well-formed GIF image is.
Likewise, the Unicode Standard tells you what a well-formed
UTF-8 byte sequence is. But it is the software designer who has
to be smart about determining what his/her software will do when
it encounters an error condition and finds itself dealing
with a sequence which is ill-formed according to the specification
of UTF-8 in the Unicode Standard.
This archive was generated by hypermail 2.1.5 : Thu Feb 27 2003 - 16:40:03 EST