From: Hans Aberg (haberg@math.su.se)
Date: Mon Feb 04 2008 - 12:22:12 CST
On 4 Feb 2008, at 18:47, Markus Scherer wrote:
> Most Unicode software and libraries use UTF-16 internally, which is
> easy to use.
It may then have a legacy from the days one thought two bytes would
be enough. - It is common in computers to keep outdated form just for
backwards compatibility, even long time they have fallen out of use.
> Some use UTF-8 even internally, if they see a large majority of
> high-volume text in ASCII.
Sure, for programs that essentially processes bytes. I made a regular
expression process, so that lexers like Flex need not be rewritten -
they essentially just process byte patterns, anyway.
> UTF-32 as a string encoding is rare. (Some people call single-code
> point integers "in UTF-32".)
This would be for libraries that cannot handle variable size
charters. C++ maybe(?).
Hans Åberg
This archive was generated by hypermail 2.1.5 : Mon Feb 04 2008 - 12:26:00 CST