RE: Handling UTF-8

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Thu Mar 01 2001 - 05:08:34 EST


Trond Trosterud wrote:
> Apropos UTF-8:
> [...}
> Now, I have a distinct feeling that there is a mathematical
> formula for doing this, e.g. on my hex calculator. [...]

Some good news for you: you can do it even WITHOUT your calculator!

Just print and cut out the magic encoder attached old email. It was for fun,
but it actually work.

This is an old version: I did several improvements to it; I remember that
the latest version (3.0, I think) could both ENcode and DEcode UTF-8 to/from
UTF-32, and the calculation algorithm was much simpler.

Unfortunately I lost all the relevant files while moving to my new
employment... However I definitely remember that one member of this mailing
list published the latest version on his web site. But I also lost the
bookmark to the web site...

Have fun.
_ Marco

-----Original Message-----
From: Marco Cimarosti
Sent: Thu Mar 16, 2000 5:02pm
To: Unicode List
Subject: Re: C.U.M.P.E. vers. 1.0 (was: Unicode to UTF-8)

Who said that Unicode is high-tech?

Here is a device to generate UTF-8 that employs traditional tools such as
ASCII art, paper, scissors, glue, brain.

        :-)

Side 1 (print and cut out):

+------------+-------+-----------------------+------+
| U+0000 | yy zz | Cima's UTF-8 Magic | Hex= |
| U+007F | ! ! | Pocket Encoder | B-4 |
| YZ | . . | | |
+------------+-------+-------+ Vers. 1.0 | 0=00 |
| U+0080 | 3x xy | 2y zz | 16 March 2000 | 1=01 |
| U+07FF | 3. .. | 2. ! | | 2=02 |
| XYZ | . . | . . | M.C. | 3=03 |
+------------+-------+-------+-------+ | 4=10 |
| U+0800 | 32 ww | 2x xy | 2y zz | | 5=11 |
| U+FFFF | ! ! | 2. .. | 2. ! | | 6=12 |
| WXYZ | E . | . . | . . | | 7=13 |
+------------+-------+-------+-------+-------+ 8=20 |
| U-00010000 | 33 0v | 2v ww | 2x xy | 2y zz | 9=21 |
| U-000FFFFF | ! 0. | 2. ! | 2. .. | 2. ! | A=22 |
| VWXYZ | F . | . . | . . | . . | B=23 |
+------------+-------+-------+-------+-------+ C=30 |
| U-00100000 | 33 1v | 2v ww | 2x xy | 2y zz | D=31 |
| U-0010FFFF | ! 1. | 2. ! | 2. .. | 2. ! | E=32 |
| VWXYZ | F . | . . | . . | . . | F=33 |
+------------+-------+-------+-------+-------+------+

Side 2 (print, cut out, and glue on back of side 1):

+---------------------------------------------------+
| Cima's UTF-8 Magic Pocket Encoder - User's Manual |
| (vers. 1.0, 16 March 2000, by Marco Cimarosti) |
| |
| - Left column: min and max Unicode scalar values: |
| pick the row that applies to the code point you |
| want to convert to UTF-8. Letters V..Z mark the |
| hexadecimal digits that have to be processed. |
| - Right column: hexadecimal to base-4 table. |
| - Central columns: work area to compute each octet|
| (1 to 4) that constitute UTF-8 octet sequences. |
| Convert each digit marked by V..Z from hex. to |
| b.-4. Write b.-4 digits on the dots placed under |
| letters v..z (two b.-4 digits per hex. digit). |
| Convert 2-digit base-4 number to hex. digits and |
| write them on the dots on the line. That is your |
| UTF-8 sequence in hex.! Exclamation marks show |
| passages that may be skipped, either because the |
| digit is hard-coded, or because it may be copied |
| directly from the scalar value. |
+---------------------------------------------------+

Enjoy!
Marco



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:19 EDT