From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Wed Oct 30 2002 - 12:39:23 EST
Dominikus Scherkl wrote:
> My other suggestion (and the main reason to call the proposed
> charakter "source failure indicator symbol" (SFIS)) was intended
> especaly for mall-formed utf-8 input that has overlong encodings.
>
> In this special case a converter exactly knows which char is
> intended, but needs to put out an error to avoid ambiguities.
> In this case by now it MUST replace the overlong char by U+FFFD
> (or even cancel the conversion!).
> But I think SFIS + intended-char is a far better approach,
> because it
> 1) warns the reader AND keeps the text readable
> 2) distinguish overlong encodings from illegal char sequenzes.
This is a special, custom form of error handling - why assign a character for it?
You could just use an existing character or non-character for this, e.g., U+303E or U+FFFF or U+FDEF
or similar.
markus
-- Opinions expressed here may not reflect my company's positions unless otherwise noted.
This archive was generated by hypermail 2.1.5 : Wed Oct 30 2002 - 13:16:29 EST