From: William J Poser (wjposer@ldc.upenn.edu)
Date: Wed Sep 20 2006 - 17:17:01 CDT
I'm confused as to the sense in which C and C++
"don't support the Unicode character model". It is
very easy to manipulate objects of type wchar_t,
arrays thereof, linked lists thereof, and so forth.
I've done a fair amount of work using Unicode in C
and not found it a problem. There are some nice libraries
for handling Unicode in C, such as Ville Laurikari's
TRE regular expression library.
It is true that having to do your own storage allocation
can be a pain, but this is independent of the Unicode
issue - you have to deal with the same issues in plain
ASCII.
The main theoretical difficulty that I see with Unicode
processing in C is that you can't be sure that a wchar_t
is at least 21 bits wide. This is of course a general
defect of the C standard, which does not specify
object sizes. In practice, however, I haven't myself
encountered problems with this or heard of them.
For the present, at least, there is also a good reason
to use C IN PREFERENCE to high level languages for
processing Unicode, for some applications. The
high-level languages that I know of all limit
Unicode support to the BMP. That is true of Python
and Tcl, for example. In contrast, in C there
is no such limitation. Which high-level languages
currently handle the full Unicode range?
This archive was generated by hypermail 2.1.5 : Wed Sep 20 2006 - 17:21:39 CDT