<DougEwell2@cs.com>
I hope that the claim of "multiple UTF-8 representations" 
does indeed refer 
to glyphs, in the sense that Unicode contains both 
precomposed characters and 
separable elements, halfwidth and fullwidth ASCII variants, 
etc.  I hope it 
does *not* refer to the nonconformant practice of 
representing Unicode 
characters with "non-shortest" UTF-8 sequences.  Instances of 
that are not 
the fault of UTF-8.
</DougEwell2@cs.com>
        Is there an existing set of recommendations for dealing with this
problem (multiple legal compositions) in search and search-like
applications?  Specifically, if there are multiple legal ways to represent a
character, how should the character be stored, should search text be
preprocessede, etc.?  Pointers, anyone?
        TiA,
/|/|ike
This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:16 EDT