From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Mar 14 2007 - 07:12:26 CST
> De : Kenneth Whistler [mailto:kenw@sybase.com]
> > Another option would be to encode only two new controls in Unicode:
> > * start control sequence;
> > * end control sequence.
>
> No. A very bad idea, IMO.
>
> If you want to write ISO 2022-conformant code that makes use
> of registered Escape sequences, then write ISO 2022-conformant
> code to do so, and have it detect the registered Escape sequences
> corresponding to the character set identifications (or any
> other other pertinent usages of Escape sequences) it is concerned
> with. That is what ISO 2022 is all about.
I did not refer to ISO 2022 in my message but to the case of many CES
without ISO standards and used nationally, or in proprietary protocols.
For example DVB subtitles and EPG, proprietary MPEG title extensions,
Videotext and Teletext... VT100 terminal protocols and similar.
In those cases, the sequences are NOT encoding characters, but attributes,
they don't qualify as regular CES because they are not encoding characters
and not not mappable to single Unicode characters.
Without a clear identification of those sequences, this causes problems
because they are still used for storing documents, but still may need
general Unicode algorithms, for example for full-text searches (like in
desktop search engines).
If transmitting those documents over Internet, they may eventually be tagged
with a specific MIME file type (not "text/plain"), but in reality they are
more than that and do also qualify as "application:*" formats; in fact they
are working in the OSI model at the presentation layer, not at the encoding
or transport layer (so they are neither CES or TES).
Although many of you don't know exactly the details of European videotext or
Teletext systems (or DVB subtitles), most of you are exposed to VT100-like
presentation formats in their text terminals or OS consoles (including
Windows "OEM" console with "ANSI" extensions, or VT100 emulators, or X11
consoles in Unix/Linux)
Still now, we have no clear pattern for identifying presentation protocols
used in terminal sessions, because we only identify CES (=codepages in
DOS/Windows environments), and because the terminal presentation
environments are orthogonal and most often completely independent from the
CES (=codepage) environment.
The case of ISO 2022 sequences is much more clear, as they are unambiguously
used as CES sequences.
This archive was generated by hypermail 2.1.5 : Wed Mar 14 2007 - 07:16:13 CST