Unicode Text Searching in Java
Presented by Helena Shih - IBM Corporation
Intended Audience: |
Software Engineer |
Session Level: |
Advanced |
Java uses Unicode as its text storage model, which allows the simultaneous
manipulation of text from many different languages and scripts. However,
searching through Unicode text can be difficult, because most of the well-known,
efficient search algorithms don't map very easily to Unicode. The combining
characters, accents, and multi-character collation elements in many languages
make matters even worse. This talk will discuss each of these problems and
discuss how to solve them using the features available in the Java class
libraries. Using successively more complete examples, we will work up to a
complete solution that performs efficient, Boyer-Moore searches on multilingual
text.
|