From: Philippe VERDY (verdy_p@wanadoo.fr)
Date: Mon Jun 06 2005 - 14:07:23 CDT
> De : "Samuel Thibault" <samuel.thibault@ens-lyon.org>
> Doug Ewell, le Mon 06 Jun 2005 07:08:15 -0700, a dit :
> > It is still possible to come up with a plausible example of text that is
> > both valid UTF-8 and plausible Latin-1, and I need to find one -- not
> > only because my current example is Windows-specific, but also because
> > Nestlé is not even a trademark (™) but a registered trademark (®).
>
> Just find two registered marks that only differ by an ending  for
> instance:
> FOO®
Instead of focusing on the trademark or registered symbols, just consider the case of the non-breaking space (U+00A0) which may follow lots of uppercase ISO 8859-1 Letters (U+00C0..U+00DF). With ISO-8859-1 you would get sequences like (0xC0,0xA0) to (0xDF,0xA0) which will also be valid UTF-8 sequences. This case is probably less rare than the contrieved example, notably when the non breaking space is used in the middle of a compound-name trademark that should remain unbreakabke, or if these sequences are used in the data of a wide HTML table, whose cells should preferably remain unbreakable (yes HTML offer another way to avoid breaks with the nobreak attribute of table cells, or with CSS, or with the <nobr> container element).
But for plain-text files, these cases are extremely rare...
This archive was generated by hypermail 2.1.5 : Mon Jun 06 2005 - 14:08:13 CDT