From: Doug Ewell (dewell@adelphia.net)
Date: Fri May 16 2003 - 02:57:46 EDT
Philippe Verdy <verdy_p at wanadoo dot fr> wrote:
> Don't forget EBCDIC, and also some Unicode-conforming encodings based
> on basic EBCDIC, where unused code units have been used to encode
> Unicode in a way similar to the UTF-8 encoding (with a simple
> reordering of bytes, so that ASCII characters are left on their
> equivalent ECDIC positions, as well as the extended EBCDIC controls
> such as NEL which are also assigned in ISO8859-* according to ISO6429
> in range 0x80 to 0x9F)...
I can think of only one such encoding, UTF-EBCDIC:
http://www.unicode.org/reports/tr16/
> Don't forget too VISCII (for Vietnamese) which uses some rarely used
> ASCII controls to map some Vietnamese characters with double accents,
> as the ISO6429 standard does not offer enough free positions in the
> range 0xA0 to 0xFF to map all Vietnamese characters. (Not conforming
> to Unicode, as there's no way to fully encode it with full roundtrip
> capability).
Of course there is. Each of the 256 VISCII code points maps to one and
only one Unicode character. 0x02 in VISCII can only be U+1EB2 LATIN
CAPITAL LETTER A WITH BREVE AND HOOK ABOVE, never U+0002 START OF TEXT.
If you'd like, I can provide a mapping table.
> Finally don't forget all the DOS/OEM codepages which assign visible
> characters in ASCII control code units and in extended ISO6429
> position... However all these are not conforming to Unicode (no way
> to fully encode it with full roundtrip capability).
Now this is true, because the controls have double meanings (e.g. 0x0D
is both a carriage return and an eighth note).
-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/
This archive was generated by hypermail 2.1.5 : Fri May 16 2003 - 03:40:22 EDT