You can break searching down into the following steps:
1. Handling Unicode itself, at least at a binary level, as opposed to
byte-streams.
2. Handling Unicode canonical equivalence: e.g.. identifying ä and a¨ (see Ch
3, TR15)
3. Handling character (e.g. grapheme) and word boundaries, so that you don't
match across them. (see Ch 5)
4. Handling locale conventions, e.g. "ae" ~ "ä" (see Ch 5, TR10)
Depending on the OS, many of these steps should be handled for you. Laura
Werner did a nice paper on this at the last Uniocde conference, you should
look at that also.
Mark
Randy Hughes wrote:
> I have written a Searching application for Windows. I am interested in
> adding Unicode searching capability to it. Can someone give me a brief
> list of issues to consider, or point me to a good starting point for adding
> this capability. If you need to see the product it can be downloaded from
> my website listed below. It will currently handle only single-byte, and I
> am trying to figure out how to get it to Double-Byte.
>
> Thanks
> Randy Hughes
> Jr Computing
> http://www.jrcomputing.com
-- business: mark.davis@us.ibm.com, mark@unicode.org personal: mark@macchiato.com, http://www.macchiato.com --
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:45 EDT