Re: Nicest UTF

From: Marcin 'Qrczak' Kowalczyk (qrczak@knm.org.pl)
Date: Wed Dec 01 2004 - 17:35:38 CST

Next message: Peter R. Mueller-Roemer: "current version of unicode-font"

Previous message: Theodore H. Smith: "Nicest UTF"
In reply to: Theodore H. Smith: "Nicest UTF"
Next in thread: Philippe Verdy: "Re: Nicest UTF"
Reply: Philippe Verdy: "Re: Nicest UTF"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

"Theodore H. Smith" <delete@elfdata.com> writes:

> Assuming you had no legacy code. And no "handy" libraries either,
[...]
> What would be the nicest UTF to use?

For internals of my language Kogut I've chosen a mixture of ISO-8859-1
and UTF-32. Normalized, i.e. a string with chracters which fit in
narrow characters is always stored in the narrow form.

I've chosen representations with fixed size code points because
nothing beats the simplicity of accessing characters by index, and the
most natural thing to index by is a code point.

Strings are immutable, so there is no need to upgrade or downgrade a
string in place, so having two representations doesn't hurt that much.
Since the majority of strings is ASCII, using UTF-32 for everything
would be wasteful.

Mutable and resizable character arrays use UTF-32 only.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak@knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/

Next message: Peter R. Mueller-Roemer: "current version of unicode-font"
Previous message: Theodore H. Smith: "Nicest UTF"
In reply to: Theodore H. Smith: "Nicest UTF"
Next in thread: Philippe Verdy: "Re: Nicest UTF"
Reply: Philippe Verdy: "Re: Nicest UTF"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Dec 01 2004 - 17:36:45 CST