RE: FW: Using Unicode Characters in ASCII Streams

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Wed Feb 06 2002 - 05:34:11 EST


Asmus Freytag wrote:
> > From: rdeking@kbs.kaba.com
[...]
> > we are a manufacturer of time and attendance terminals which
> > aretransfering data using 8-Bit character streams
[...]
> > Now here is my question: Is there a method to add any
> > Unicode character to a 8-Bit ASCII stream?
[...]
>
> There are three or four options for forcing Unicode into an
> 8-bit format.
>
> a) Use UTF-8. This preserves ASCII, but the characters >127
> are different
> from Latin-1.
>
[...]
>
> Of these four approaches, d) uses the least space, a) is the
> most widely supported in plain text files [...]
>
> All four require that the receiver can understand that
> format, but a) is considered one of the three
> equivalent Unicode Encoding Forms and therefore standard.

I'd like to stress that this being standard implies that UTF-8 is supported
out-of-the-box by many word processors and text editors, on many operating
systems.

This is important because, normally, the localized text messages to be sent
to embedded terminals are contained in normal text files, prepared on a
standard personal computer. Often, the person who physically edits the
message is a free-lance translator who knows nothing about the technical
details of the embedded terminal.

So, sticking to UTF-8 may simplify the task of preparing and distributing
localized message,

E.g., when you want to go Russian, you just hire a Russian translator and
ask him to "please submit the files in UTF-8. If he has some expertise on
text files, (s)he will need no further clarification, and send proper UTF-8.
In the other case, if (s)he doesn't understand and submit the file in some
other kind of standard encoding, you just pick up one of the many existing
programs to convert encoding and turn the file to UTF-8.

On the other hand, using proprietary formats also implies implementing
proprietary utilities for the personal computer: text editors, viewers,
converters, etc.

Moreover, if UTF-8 support is to be inserted in the embedded terminal, it is
easy to find the relevant code already implemented and tested in good C
language. On the other hand, a proprietary format must be designed,
implemented and tested from scratch.

_ Marco



This archive was generated by hypermail 2.1.2 : Wed Feb 06 2002 - 05:08:58 EST