From: Doug Ewell (doug@ewellic.org)
Date: Fri Dec 25 2009 - 13:29:24 CST
On Mon Dec 14 2009 14:26:16 CST, Julian Bradfield <jcb plus unicode at
inf dot ed dot ac dot uk> wrote:
> I'm sure someone can come up with an example of two utf-8 canonically
> equivalent strings that both make (different) sense in some other
> encoding.
For perhaps the wrong reason, this reminded me of:
NESTLÉ®
my canonical example of a plausible Latin-1 string that could be
interpreted (wrongly, of course) as UTF-8. The last two characters are
U+00C9 U+00AE, and the corresponding Latin-1 byte values 0xC9 0xAE are
UTF-8 for ɮ U+026E LATIN SMALL LETTER LEZH.
I probably need a new canonical example, because this one isn't wholly
realistic; Nestlé doesn't appear to be a registered trademark (the legal
name appears to be Nestlé S.A.) and the name is not generally spelled
with all-caps.
-- Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org RFC 5645, 4645, UTN #14 | ietf-languages @ http://is.gd/2kf0s
This archive was generated by hypermail 2.1.5 : Fri Dec 25 2009 - 13:35:54 CST