Re: Unicode lexer

From: Frank Yung-Fong Tang (franktang@gmail.com)
Date: Wed Apr 20 2005 - 08:23:57 CST

Next message: Peter Constable: "RE: Unicode Bloopers"

Previous message: Patrick Andries: "Re: String name and Character Name"
Maybe in reply to: Tex Texin: "Unicode lexer"
Next in thread: Hans Aberg: "Re: Unicode lexer"
Reply: Hans Aberg: "Re: Unicode lexer"
Reply: Tex Texin: "Re: Unicode lexer"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

I think one question we need to first answer is how do you define an

Unicode Enabled Lexer

I don't have a good answer. But I think it should at least include the
following

1. Have the ability to scane UTF-8 (and/or UTF-16) input file
2. Have the ability to return token in one or more transformation format of
Unicode
3. Have the ability to handle some set of Unicode regular expression
features
4. Have the ability to support programming language specific Unicode
'escape' sequence. ( \uHHHH, &#ddddd; &#xxxxx; \HHHHH , etc) The lexer may
not support it directly, but it should be able to let the Lexer caller to
define a way to deal with it.
5. Use some Unicode based String data type as primitive datatype to return
the result in the token.[?]

-- 
Frank Yung-Fong Tang
譚永鋒
Šýšţém Årçĥîţéçţ

Next message: Peter Constable: "RE: Unicode Bloopers"
Previous message: Patrick Andries: "Re: String name and Character Name"
Maybe in reply to: Tex Texin: "Unicode lexer"
Next in thread: Hans Aberg: "Re: Unicode lexer"
Reply: Hans Aberg: "Re: Unicode lexer"
Reply: Tex Texin: "Re: Unicode lexer"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Apr 20 2005 - 08:26:16 CST