From: Hans Aberg (haberg@math.su.se)
Date: Tue Jan 18 2005 - 18:09:33 CST
On 2005/01/18 21:25, Jon Hanna at jon@hackcraft.net wrote:
>> Under C/C++, one will use a wchar_t which is always of exactly 32-bit,
>> regardless what internal word structure the CPU is using in
>> its memory bus.
>
> wchar_t can be 7bits in size or more than 128bits.
Whatever it can be, modern platforms, such as GNU, have decided that it
won't, but will be 32 bits. See
<http://www.cl.cam.ac.uk/~mgk25/unicode.html>.
>>> Not sure if I understand you correctly. What about 00 vs.
>> C0.80, E0.80.80,
>>> FE.80.80.80.80.80.80 etc.?
>>
>> I have added functions that admit creating regular
>> expressions also for the
>> overloaded UTF-BSS ("UTF-8") multibytes. This way, a lexer can provide
>
> They aren't "overloaded", they are invalid.
You probaly mean that the overloaded UTF-BSS (or whatever the correct name
is) multibytes are illegal under UTF-8.
Hans Aberg
This archive was generated by hypermail 2.1.5 : Tue Jan 18 2005 - 18:13:34 CST