From: Tom Emerson (tree@basistech.com)
Date: Wed Apr 20 2005 - 05:34:21 CST
Tex Texin writes:
> I would be interested in pointers to any papers, case studies etc. on
> migrating programming languages to be Unicode-enabled. (No sense
> repeating the sins of the past.)
I would take a look at Python and the various specifications that were
written around its Unicode implementation. The guys who implemented it
did a fantastic job. Indeed, the implementation is pretty easy to read
as well, so you may just want to look at the code.
There are, of course, a couple of levels of "Unicode-enablement"
within a programming language. Many moons ago I was involved with
working on the Unicode-enablement of Gwydion Dylan, though life
intervened and I had to stop. If "all" you need to do is provide
support for a Unicode string type, with appropriate transcoders, then
the task is considerably easier than if you are enabling the entire
language to allow Unicode identifiers, a la Java. Since you are asking
for a Unicode enabled lexer, I assume the latter.
I thought that Flex had been modified to deal with Unicode... I guess
that isn't the case.
You don't mention the implementation language: whether it's C, C++,
Java, or something else entirely. That will certainly constrain your
choices.
It may end up being easier to develop your own lexer from scratch, not
using Flex or other lexer generator. But again, without knowing more
about the problem, it's hard to say. FWIW I've taken this approach in
one project, and it worked well, especially given UAX #31 as a
starting point.
-tree
-- Tom Emerson Basis Technology Corp. Software Architect http://www.basistech.com "Beware the lollipop of mediocrity: lick it once and you suck forever"
This archive was generated by hypermail 2.1.5 : Wed Apr 20 2005 - 05:35:53 CST