From: Christoph Päper (christoph.paeper@crissov.de)
Date: Sun Feb 20 2011 - 13:48:56 CST
Thomas Cropley:
> <UTF-c.htm>
It’s a fair idea to be backwards compatible with (most of) ISO 8859-1 by encoding U+00C0–00FF as C0h (11000000b) through FFh (11111111b) – I will not consider codepage switching with quasi-BOMs at all, because it seems like a bad idea, U+00A0–00BF are missing anyhow – and reusing the bytes 80h (10000000b) through BFh (10111111), not 9Fh , for encoding higher codepoints. I don’t think it’s a good idea to also use 11......b in multibyte code sequences, though.
UTF-8: ASCII and 3–5bit/2bit prefixes
0....... isolation prefix,
110..... initial prefix,
1110.... initial prefix,
11110... initial prefix,
11111... illegal prefix;
10...... medial and final prefix.
7 0xxxxxxx
11 110yyyxx 10xxxxxx
16 1110yyyy 10yyyyxx 10xxxxxx
21 11110zzz 10zzyyyy 10yyyyxx 10xxxxxx
UTF-c: ASCII and 2bit prefixes
0....... isolation prefix,
10...... initial and final prefix,
11...... medial and isolation prefix.
7 0xxxxxxx
6 11xxxxxx
12 10yyyyxx 10xxxxxx
18 10zzyyyy 11yyyyxx 10xxxxxx
21 10°°°zzz 11zzyyyy 11yyyyxx 10xxxxxx
Type 1: ASCII and 4bit prefix
0....... isolation prefix,
11...... isolation prefix,
10##.... initial prefixes with following-bytes count,
1000.... medial and final prefix.
7 0xxxxxxx
6 11xxxxxx
8 1001xxxx 1000xxxx
12 1010yyyy 1000xxxx 1000xxxx
16 1011yyyy 1000yyyy 1000xxxx 1000xxxx
=> incomplete coverage.
Type 2: ASCII and 5bit/3bit prefix
0....... isolation prefix,
11...... isolation prefix,
101##... initial prefixes with following-bytes count (+1),
100..... medial and final prefix.
7 0xxxxxxx
6 11xxxxxx
8 10100xxx 100xxxxx
13 10101yyy 100yyxxx 100xxxxx
18 10110zzy 100yyyyy 100yyxxx 100xxxxx
21 10111°°° 100°zzzy 100yyyyy 100yyxxx 100xxxxx
Type 3.1: ASCII and 3bit prefix
0....... isolation prefix,
11...... isolation prefix,
101..... initial and medial byte prefix,
100..... final byte prefix.
7 0xxxxxxx
6 11xxxxxx
10 101yyxxx 100xxxxx
15 101yyyyy 101yyxxx 100xxxxx
20 101zzzzy 101yyyyy 101yyxxx 100xxxxx
21 101°°°°z 101zzzzy 101yyyyy 101yyxxx 100xxxxx
Type 3.2: ASCII and 3bit prefix
0....... isolation prefix,
11...... isolation prefix,
101..... initial and final prefix,
100..... medial prefix.
7 0xxxxxxx
6 11xxxxxx
10 101yyxxx 101xxxxx
15 101yyyyy 100yyxxx 101xxxxx
20 101zzzzy 100yyyyy 100yyxxx 101xxxxx
21 101°°°°z 100zzzzy 100yyyyy 100yyxxx 101xxxxx
Type 3.3: ASCII and 3bit prefix
0....... isolation prefix,
11...... isolation prefix,
101..... initial prefix,
100..... medial and final prefix.
7 0xxxxxxx
6 11xxxxxx
10 101yyxxx 100xxxxx
15 101yyyyy 100yyxxx 100xxxxx
20 101zzzzy 100yyyyy 100yyxxx 100xxxxx
21 101°°°°z 100zzzzy 100yyyyy 100yyxxx 100xxxxx
Type 4: Latin1 and 4bit prefix
0....... isolation prefix,
101..... isolation prefix,
11...... isolation prefix,
1001.... initial prefix,
1000.... medial and final prefix.
7 0xxxxxxx
6 11xxxxxx
5 101xxxxx
8 1001xxxx 1000xxxx
12 1001yyyy 1000xxxx 1000xxxx
16 1001yyyy 1000yyyy 1000xxxx 1000xxxx
20 1001zzzz 1000yyyy 1000yyyy 1000xxxx 1000xxxx
21 1001°°°z 1000zzzz 1000yyyy 1000yyyy 1000xxxx 1000xxxx
Type 5: Latin1 and 6bit/4bit prefix
0....... isolation prefix,
101..... isolation prefix,
11...... isolation prefix,
1001##.. initial prefix with following-bytes count (+1),
1000.... medial and final prefix.
7 0xxxxxxx
6 11xxxxxx
5 101xxxxx
6 100100xx 1000xxxx
10 100101yy 1000xxxx 1000xxxx
14 100110yy 1000yyyy 1000xxxx 1000xxxx
18 100111zz 1000yyyy 1000yyyy 1000xxxx 1000xxxx
=> incomplete coverage.
This archive was generated by hypermail 2.1.5 : Sun Feb 20 2011 - 13:51:25 CST