Karl Williamson wrote:
> It seems counterintuitive to me that the two byte sequence C0 80
> should be replaced by 2 replacement characters under best practices,
> or that E0 80 80 should also be replaced by 2. Each sequence was legal
> in early Unicode versions,
This is overstated at best. Decoders weren't required to detect overlong
sequences until 2000, but it was never legal to generate them. This was
stated explicitly in RFC 2279 and in Unicode 1.1, Appendix F. Correct
use of the instructions and table in RFC 2044 also precluded the
creation of overlong sequences.
-- Doug Ewell | Thornton, CO, US | ewellic.orgReceived on Mon Dec 19 2016 - 17:53:44 CST
This archive was generated by hypermail 2.2.0 : Mon Dec 19 2016 - 17:53:44 CST