Finite State Automata for Unicode
Thomas Emerson - Basis Technology Corporation
Intended Audience: |
Software Engineers |
Session Level: |
Advanced |
The literature on finite state automata generally assumes a relatively
small alphabet, often 128 or fewer characters. Small alphabets allow
one to implement an FSA efficiently and in a straight-forward manner.
Large alphabets (of which Unicode is a prime example) can make the
efficient and compact implementation of automata difficult. This talk
presents the problems encountered when handling large alphabets in an
FSA implementation and describes some methods to handle them.
The presentation presumes some knowledge of automata, though time will
be spent in the beginning to present a brief overview of the necessary
theory.
|