Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8
Henri Sivonen via Unicode
unicode at unicode.org
Tue May 16 02:01:03 CDT 2017
On Tue, May 16, 2017 at 6:23 AM, Karl Williamson
<public at khwilliamson.com> wrote:
> On 05/15/2017 04:21 AM, Henri Sivonen via Unicode wrote:
>> In reference to:
>> I think Unicode should not adopt the proposed change.
>> The proposal is to make ICU's spec violation conforming. I think there
>> is both a technical and a political reason why the proposal is a bad
> Henri's claim that "The proposal is to make ICU's spec violation conforming"
> is a false statement, and hence all further commentary based on this false
> premise is irrelevant.
> I believe that ICU is actually currently conforming to TUS.
Do you mean that ICU's behavior differs from what the PDF claims (I
didn't test and took the assertion in the PDF about behavior at face
value) or do you mean that despite deviating from the
currently-recommended best practice the behavior is conforming,
because the relevant part of the spec is mere best practice and not a
> TUS has certain requirements for UTF-8 handling, and it has certain other
> "Best Practices" as detailed in 3.9. The proposal involves changing those
> recommendations. It does not involve changing any requirements.
Even so, I think even changing a recommendation of "best practice"
needs way better rationale than "feels right" or "ICU already does it"
when a) major browsers (which operate in the most prominent
environment of broken and hostile UTF-8) agree with the
currently-recommended best practice and b) the currently-recommended
best practice makes more sense for implementations where "UTF-8
decoding" is actually mere "UTF-8 validation".
hsivonen at hsivonen.fi
More information about the Unicode