RE: Best practices for replacing UTF-8 overlongs

From: Shawn Steele <Shawn.Steele_at_microsoft.com>
Date: Tue, 20 Dec 2016 05:31:43 +0000

Yes, I just don't see how the # of emitted replacement characters changes the flowchart on what to do when it's bad :)

-----Original Message-----
From: Unicode [mailto:unicode-bounces_at_unicode.org] On Behalf Of Martin J. Dürst
Sent: Monday, December 19, 2016 7:20 PM
To: 'Unicode Mailing List' <unicode_at_unicode.org>
Subject: Re: Best practices for replacing UTF-8 overlongs

On 2016/12/20 11:35, Tex Texin wrote:
> Shawn,
>
> Ok, but that begs the questions of what to do...
> "All bets are off" is not instructive.

Well, it may be instructive in that its difficult to get software to decide what happened. A human may be in a better position to analyze the error and the cause(s) of the error, and to fix these.

> How software behaves in the face of invalid bytes, what it does with them, what it does about them, and how it continues (or not) still needs to be determined.

Yes, but that will depend on circumstances. In a safety-critical application, you'll want to do something different than if you are sending the text to a printer just to have a look at it.

Regards, Martin.
Received on Mon Dec 19 2016 - 23:32:08 CST

This archive was generated by hypermail 2.2.0 : Mon Dec 19 2016 - 23:32:08 CST