From: Frank Yung-Fong Tang (franktang@gmail.com)
Date: Wed Apr 20 2005 - 08:23:57 CST
I think one question we need to first answer is how do you define an
Unicode Enabled Lexer
I don't have a good answer. But I think it should at least include the
following
1. Have the ability to scane UTF-8 (and/or UTF-16) input file
2. Have the ability to return token in one or more transformation format of
Unicode
3. Have the ability to handle some set of Unicode regular expression
features
4. Have the ability to support programming language specific Unicode
'escape' sequence. ( \uHHHH, &#ddddd; &#xxxxx; \HHHHH , etc) The lexer may
not support it directly, but it should be able to let the Lexer caller to
define a way to deal with it.
5. Use some Unicode based String data type as primitive datatype to return
the result in the token.[?]
-- Frank Yung-Fong Tang 譚永鋒 Šýšţém Årçĥîţéçţ
This archive was generated by hypermail 2.1.5 : Wed Apr 20 2005 - 08:26:16 CST