Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Mon Mar 03 2003 - 14:21:58 EST

Next message: Asmus Freytag: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"

Previous message: Mark Davis: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
In reply to: Mark Davis: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Next in thread: Mark Davis: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Reply: Mark Davis: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

But, formally speaking, is it conformant for an API to not stop, and merely
raise an error flag (that the caller may or may not look at)?

I argue that it is.

A./

At 09:09 AM 3/3/03 -0800, Mark Davis wrote:
>Asmus has good points about the restartability, both that it gives the API
>user the maximal flexibility, and that many times the users don't want to
>futz with such options, and just want the text converted.
>
>To provide maximal flexibility, an API will give the choice for illegal
>squences of (1) deleting, (2) substituting (character, escape (e.g.
>"઼", or other options), or (3) stopping with information: the reason
>for the error, the end position of the last successfully converted sequence,
>and the end position of the bad sequence. And users may want to distinguish
>between illegal sequences and missing characters in applying these options;
>that is, they may want to silently delete illegal sequences, but substitute
>a replacement character for missing characters.
>
>Mark
>________
>mark.davis@jtcsv.com
>IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
>(408) 256-3148
>fax: (408) 256-0799
>
>----- Original Message -----
>From: "Asmus Freytag" <asmusf@ix.netcom.com>
>To: "Mark Davis" <mark.davis@jtcsv.com>; "Kent Karlsson"
><kentk@md.chalmers.se>; "'Michael (michka) Kaplan'" <michka@trigeminal.com>
>Cc: "'Yung-Fong Tang'" <ftang@netscape.com>; <unicode@unicode.org>
>Sent: Sunday, March 02, 2003 21:10
>Subject: Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for
>review)
>
>
> > At 07:21 AM 3/2/03 -0800, Mark Davis wrote:
> > > > "C12a When a process interprets a code unit sequence which
> > > > purports to be in a Unicode character encoding form, it
> > > > shall treat ill-formed code unit sequences as an error
> > > > condition, and shall not interpret such sequences as
> > > > characters."
> >
> > Can we agree or disagree on whether an API that returns an error code, but
> > also an output buffer that contains a simplistic conversion of the
> > erroneous sequence is or is not conformant.
> >
> > To me it seems that by setting an error flag in the return code, the API
> > has signalled that the user should not treat the output as containing
> > correct Unicode.
> >
> > Such an API design (on a low enough level) might strike the right balance
> > between between usability in many different environments and satisfying
>the
> > formal requirement.
> >
> > The ideal case is one where the converter stops in a restartable
> > configuration, allowing the client to implement (or ask for) a variety of
> > error-recovery options. However, such an interface requires a lot of
> > thought and may be difficult to implement for some
> > language/platform/library environments. Further, it may be unnecessarily
> > difficult to use for at least some conceivable clients.
> >
> > A./
> >
> >

Next message: Asmus Freytag: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Previous message: Mark Davis: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
In reply to: Mark Davis: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Next in thread: Mark Davis: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Reply: Mark Davis: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Mar 03 2003 - 14:57:51 EST