RE: Devanagari

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Mon Jan 21 2002 - 08:20:17 EST

Previous message: Tex Texin: "Norwegian sorting"
Maybe in reply to: Aman Chawla: "Devanagari"
Next in thread: David Starner: "Re: Devanagari"
Next in thread: Mark Davis: "Fw: Devanagari"
Reply: David Starner: "Re: Devanagari"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Doug Ewell wrote:
> Devanagari text encoded in SCSU occupies exactly 1 byte per
> character, plus an additional byte near the start of the
> file to set the current window (0x14 = SC4).

The problem is what happens if that very byte gets corrupted for any
reason...

If an octet is erroneously deleted, changed or added from an UTF-8 stream,
only a single character would be corrupted. If the same thing happens to the
window-setting byte of a SCSU (or other similar "zany" formats), the whole
stream turns into garbage.

What this means in practice for website developers is:

1) SCSU text can only be edited with a text editor which properly decodes
the *whole* file on load and re-encodes it on save. On the other hand, UTF-8
text can also be edited using an encoding-unaware editor, although non-ASCII
text is invisible.

2) SCSU text cannot be built by assembling binary pieces coming from
external sources. E.g., you cannot get a SCSU-encoded template file and fill
in the blanks with customer data coming from a SCSU-encoded database: each
time you insert a piece of text coming from the database, you delete the
current window information, turning into garbage the rest of the file. On
the other hand, UTF-8 allows this, provided that the integrity of each
multi-byte sequence is maintained.

3) A SCSU page can only be accepted by browsers and e-mail readers that are
able to decode it. On the other hand, UTF-8 also works on old ASCII-based
browsers, although non-ASCII text is clearly not properly displayed.

_ Marco

Previous message: Tex Texin: "Norwegian sorting"
Maybe in reply to: Aman Chawla: "Devanagari"
Next in thread: David Starner: "Re: Devanagari"
Next in thread: Mark Davis: "Fw: Devanagari"
Reply: David Starner: "Re: Devanagari"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Mon Jan 21 2002 - 07:47:08 EST