From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Sun May 28 2006 - 14:55:04 CDT
Cristian Secara wrote on Sunday, May 28, 2006 at 3:08 PM
Re: Unicode, SMS, PDA/cellphones
> On Sun, 28 May 2006 16:31:06 +0800, Donald Z. Osborn wrote:
>
>> * Message length being rather shorter in Unicode SMS than with 7 or 8
>> bit
>
> Usual [Latin] SMS messages are using the 7-bit GSM character set. Just
> a few additional characters are using an escape character.
> (ref.: http://www.csoft.co.uk/sms/character_sets/gsm.htm )
> A single SMS message written solely using characters from the 7-bit GSM
> character set can have maximum 160 characters. If, during SMS
> composition, a single non-GSM character is entered, then the whole
> message will turn to double byte, limiting a single message to maximum
> 70 characters. I don't know if each transmitted character is direct 2
> bytes PMB, or UTF16 transformation encoding.
>
> Every time I try to send a SMS message that includes accented
> characters for my language (Romanian), I can't stop to blame those who
> have established the SMS technical standard, because the fixed 2-bytes
> character for Latin is pure waste of space (and money :).
This sounds like an application for SCSU! The Romanian performance will
take a slight hit from the distinction of comma below and cedilla in the
Unicode glyph standard, as there will be a 2-byte overhead each to defined
the windows for Latin Extended-A (a breve and o breve) and the high half of
Latin Extended-B (s and t with comma below). I expect these characters
would use 2-bytes, while Latin-1 (a and i with circumflex) would get the
1-byte codes. ASCII characters would always be encoded as 1-byte in
alphabetic text.
What's happened to the old telegraphic standard for Romanian? I understand
that used 'tz' for t with comma below.
Richard.
This archive was generated by hypermail 2.1.5 : Sun May 28 2006 - 15:13:35 CDT