From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed May 19 2004 - 18:20:57 CDT
From: Frank Yung-Fong Tang wrote:
> It should be:
> Legal UTF-8 sequences are:
> 1st---- 2nd---- 3rd---- 4th---- Codepoints---
> 00-7F 0000- 007F
> C2-DF 80-BF 0080- 07FF
> E0 A0-BF 80-BF 0800- 0FFF
> E1-EC 80-BF 80-BF 1000- CFFF
> ED 80-9F 80-BF D000- D7FF
> EE-EF 80-BF 80-BF E000- FFFF
> F0 90-BF 80-BF 80-BF 10000- 3FFFF
> F1-F3 80-BF 80-BF 80-BF 40000- FFFFF
> F4 80-8F 80-BF 80-BF 100000-10FFFF
However I feel it's not legal (or really not recommanded) to encode non-character codepoints xFFFE-xFFFF where x is any plane number. So the rules need to be a bit more detailed to exclude them.
Are these permanently assigned non-characters encodable in any UTF or in CESU-8?
This archive was generated by hypermail 2.1.5 : Wed May 19 2004 - 18:21:45 CDT