From: Andrew West (andrewcwest@gmail.com)
Date: Tue Jun 07 2005 - 07:07:24 CDT
On 06/06/05, Doug Ewell <dewell@adelphia.net> wrote:
>
> It is still possible to come up with a plausible example of text that is
> both valid UTF-8 and plausible Latin-1, and I need to find one -- not
> only because my current example is Windows-specific, but also because
> Nestlé is not even a trademark (™) but a registered trademark ((r)).
>
Something like 2×˝=1 (2 times one half equals 1) which is <32 D7 BD 3D
31> in ISO-8859-1 is both plausible Latin-1 and valid UTF-8 = <0032
05FD 003D 0031>. Although the resultant UTF-8 text in this example is
meaningless as U+05FD is not (yet) an assigned character, most editors
do not check for character validity (if they did they would not be
forward compatible with future versions of Unicode), and so will
happily assume that this example is UTF-8 rather than any other
character set -- certainly Notepad automatically opens a file
containing this example as UTF-8.
Andrew
This archive was generated by hypermail 2.1.5 : Tue Jun 07 2005 - 07:08:41 CDT