From: Carl W. Brown (cbrown@xnetinc.com)
Date: Fri Dec 10 2004 - 10:58:11 CST
Philippe,
> Also a broken opening tag for HTML/XML documents
In addition to not having endian problems UTF-8 is also useful when tracing
intersystem communications data because XML and other tags are usually in
the ASCII subset of UTF-8 and stand out making it easier to find the
specific data you are looking for.
However, within the program itself UTF-8 presents a problem when looking for
specific data in memory buffers. It is nasty, time consuming and error
prone. Mapping UTF-16 to code points is a snap as long as you do not have a
lot of surrogates. If you do then probably UTF-32 should be considered.
From a cost to support there are valid reasons to use a mix of UTF formats.
Carl
This archive was generated by hypermail 2.1.5 : Fri Dec 10 2004 - 11:00:15 CST