From: Tex Texin (tex@i18nguy.com)
Date: Wed Apr 20 2005 - 03:05:38 CST
Thanks for the replies to my question on Unicode-enabled lexers. Here is
my compiled list.
The advice is:
1) Patrick Andries: Javacc can handle Unicode and has a lexer integrated
into it, but it
also includes a syntax parser.
https://javacc.dev.java.net/doc/features.html
2) Hans Aberg posted in the Flex list
List-Archive: <http://lists.gnu.org/pipermail/help-flex>
Haskell code that admits one to generate Flex-like regular
expressions from Unicode character number classes, in a way that the
generated lexer parses your choice of UTF-8 or UTF-32 (big or little
endian). So you might be able to use Flex or some similar lexer
generator by entering those regular expressions by hand into the
lexer source file.
3) Gregg Reynolds:
http://jflex.de/
https://javacc.dev.java.net/
4) Frank Tang:
XSFT is Unicode enabled already
http://www.stanford.edu/~laurik/fsmbook/home.html
5) I also found a thread on this list in January 2005 that claimed:
many lexer/scanner projects available in SourceForge.net. Many of them
support Unicode. See for example the results page, when searching for
"lexer" in the SourceForge "software/group" category: See also the
various references they contain for other similar open projects or
commercial products.
-- ------------------------------------------------------------- Tex Texin cell: +1 781 789 1898 mailto:Tex@XenCraft.com Xen Master http://www.i18nGuy.com XenCraft http://www.XenCraft.com Making e-Business Work Around the World -------------------------------------------------------------
This archive was generated by hypermail 2.1.5 : Wed Apr 20 2005 - 03:07:38 CST