Re: discovering code points with embedded nulls

From: Otto Stolz (Otto.Stolz@uni-konstanz.de)
Date: Wed Feb 05 2003 - 13:43:25 EST

Next message: Asmus Freytag: "Re: VS vs. P14 (was Re: Indic Devanagari Query)"

Previous message: Marco Cimarosti: "RE: discovering code points with embedded nulls"
In reply to: Erik.Ostermueller@alltel.com: "discovering code points with embedded nulls"
Next in thread: Erik.Ostermueller@alltel.com: "RE: discovering code points with embedded nulls"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> I'm dealing with an API that claims it doesn't support unicode characters with embedded nulls.
...

> Test all constituent bytes for 0x00.

This depends on the encoding form you are using (and the API is expecting):

- UTF-8 encodes a Unicode string into a sequence of bytes;
this sequence contains no 0x00 bytes.
Btw., ASCII characters are encoded the same way as in ASCII.

- UTF-16 encodes a Unicode string into a sequence of 16-bit units,
   hence it makes no sense to look at this encoding bytewise.
   If you nevertheless treat a 16-bit unit as a sequence of two bytes
   (repeat: this is a no-no), then you will most probably find
   0x00 bytes therein; in particular, every ASCII character is
   encoded as a sequence of the respective ASCII byte and a 0x00 byte
   (both orders are possible, cf.
<http://www.unicode.org/faq/utf_bom.html>).

- UTF-32 encodes a Unicode string into a sequence of 32-bit units,
   hence it makes no sense to look at this encoding bytewise.
   If you nevertheless treat a 32-bit unit as a sequence of four bytes
   (repeat: this is a no-no), then you will certainly find
   0x00 bytes therein; in particular, every ASCII character is
   encoded as a sequence of the respective ASCII byte and three
   0x00 bytes.

Best wishes,
Otto Stolz

Next message: Asmus Freytag: "Re: VS vs. P14 (was Re: Indic Devanagari Query)"
Previous message: Marco Cimarosti: "RE: discovering code points with embedded nulls"
In reply to: Erik.Ostermueller@alltel.com: "discovering code points with embedded nulls"
Next in thread: Erik.Ostermueller@alltel.com: "RE: discovering code points with embedded nulls"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Feb 05 2003 - 14:26:54 EST