> it’s more meaningful for whoever sees the output to see a single U+FFFD representing
> the illegally encoded NUL that it is to see two U+FFFDs, one for an invalid lead byte and
> then another for an “unexpected” trailing byte.
I disagree. It may be more meaningful for some applications to have a single U+FFFD representing an illegally encoded 2-byte NULL than to have 2 U+FFFDs. Of course then you don't know if it was an illegally encoded 2-byte NULL or an illegally encoded 3-byte NULL or whatever, so some information that other applications may be interested in is lost.
Personally, I prefer the "emit a U+FFFD if the sequence is invalid, drop the byte, and try again" approach.
-Shawn
Received on Wed May 31 2017 - 14:28:22 CDT
This archive was generated by hypermail 2.2.0 : Wed May 31 2017 - 14:28:22 CDT