RE: Zero termination

From: Phillips, Addison (addison@amazon.com)
Date: Sat Jun 27 2009 - 12:21:16 CDT

Next message: Venugopalan G: "Re: Zero termination"

Previous message: Venugopalan G: "Re: Zero termination"
In reply to: Venugopalan G: "Re: Zero termination"
Next in thread: Venugopalan G: "Re: Zero termination"
Reply: Venugopalan G: "Re: Zero termination"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Venu,

Thanks for the detailed desc.
The input is always a readable text from some language(not necessarily English), not an arbitary UTF16 stream.
Let me put the question in diff manner.

Is it possible that a readable/valid string of any other language has a U+0000 in the middle?
AP> No. It doesn’t matter what the language is. The only character in Unicode (and thus UTF-16) that uses the code unit 0x0000 is NULL.

I understand that U+0000 is used for representing NULL char. But is it always NULL irrespective of language/charset?
AP> Yes. Always.

One possibility i cud think of is, e.g. some chinese character might have
one code point = two 16b code units,
AP> Some Chinese (and other characters from other scripts) in fact do use two 16-bit code units. These are called a “surrogate pair” and are restricted to a specific range of code units which are never null.

where 1st 16bit unit is something and the next 16 bit is U+0000. Is that possible?
AP> No.

Any real world character with such encoding value? Does unicode allow character sets to choose U+0000 for their code point representation?
AP> Unicode is the character set. It encodes the various scripts used to write the world’s languages, assigning each character a unique code point. The code point U+0000 is assigned (solely, uniquely) to NULL.
Addison

Addison Phillips
Globalization Architect -- Lab126

Internationalization is not a feature.
It is an architecture.

Next message: Venugopalan G: "Re: Zero termination"
Previous message: Venugopalan G: "Re: Zero termination"
In reply to: Venugopalan G: "Re: Zero termination"
Next in thread: Venugopalan G: "Re: Zero termination"
Reply: Venugopalan G: "Re: Zero termination"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sat Jun 27 2009 - 12:24:27 CDT