From: Ken Krugler (ken@transpac.com)
Date: Mon Apr 07 2003 - 20:49:08 EDT
The GSM0338.TXT mapping file from the Unicode web site contains a
brief discussion of the proper handling of the null byte, which
apparently can be interpreted as either a COMMERCIAL AT character or
a NULL:
# 0x00 is NULL (when followed only by 0x00 up to the
# end of (fixed byte length) message, possibly also up to
# FORM FEED. But 0x00 is also the code for COMMERCIAL AT
# when some other character (CARRIAGE RETURN if nothing else)
# comes after the 0x00.
I've read what I thought were the relevant ETSI standards documents,
and done a search via Goggle, and so far this is the only reference
I've found to such special handling for the null byte.
On the other hand, I have seen cell phones sending text with a
terminating null byte, which makes me think that perhaps the Unicode
document reflects reality, versus what's described in the standards
document.
So does anybody know of additional reference material that would
provide clarification?
Also, the above paragraph from the mapping table seems to imply that
when a null byte is encountered, if the next byte is not a null then
it should be mapped as a COMMERCIAL AT character, otherwise if the
next byte is a null then these should all be treated as NULL
characters, up to the end of the message or a non-null byte. And from
what I've observed, if the null byte is at the very end of the
message, then it should also be mapped as a NULL character. Does this
match what anybody else has observed in trying to convert from GSM
03.38 to Unicode?
Thanks,
-- Ken
This archive was generated by hypermail 2.1.5 : Mon Apr 07 2003 - 21:33:21 EDT