From: Theodore H. Smith (delete@elfdata.com)
Date: Fri Dec 10 2004 - 15:07:16 CST
> Philippe,
>
>> Also a broken opening tag for HTML/XML documents
>
> In addition to not having endian problems UTF-8 is also useful when
> tracing
> intersystem communications data because XML and other tags are usually
> in
> the ASCII subset of UTF-8 and stand out making it easier to find the
> specific data you are looking for.
That was the whole point of my original thread.
What you say is simply not true. You can process UTF-8 as bytes. Using
your approach, even UTF16 needs multiple codepoints to be treated as a
character, because of decomposed characters.
But with most tasks (but not all), you can treat Unicode as bytes,
using UTF-8.
I've done this extensively, and it works just fine.
The reason I repeat this, is because even people like me (who are able
to understand) could be confused, if they receive the wrong information
and none of the right information.
If someone who was able to understand UTF-8 got both the right and
wrong information, they'd be able to make up their own mind. But if
they just got the wrong information, they could be mislead, as I was.
Which is why I'm repeating that you can treat UTF-8 as bytes, most of
the time, and it works just perfectly.
-- Theodore H. Smith - Software Developer - www.elfdata.com/plugin/ Industrial strength string processing code, made easy. (If you believe that's an oxymoron, see for yourself.)
This archive was generated by hypermail 2.1.5 : Fri Dec 10 2004 - 15:08:57 CST