<DougEwell2@cs.com>
I hope that the claim of "multiple UTF-8 representations"
does indeed refer
to glyphs, in the sense that Unicode contains both
precomposed characters and
separable elements, halfwidth and fullwidth ASCII variants,
etc. I hope it
does *not* refer to the nonconformant practice of
representing Unicode
characters with "non-shortest" UTF-8 sequences. Instances of
that are not
the fault of UTF-8.
</DougEwell2@cs.com>
Is there an existing set of recommendations for dealing with this
problem (multiple legal compositions) in search and search-like
applications? Specifically, if there are multiple legal ways to represent a
character, how should the character be stored, should search text be
preprocessede, etc.? Pointers, anyone?
TiA,
/|/|ike
This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:16 EDT