From: verdy_p (verdy_p@wanadoo.fr)
Date: Thu Feb 25 2010 - 08:53:14 CST
> De : "spir"
> I wonder whether I am right about legacy character sets. Are there some working like Unicode, meaning they have
the concept of scripting bits? In other are there character sets which codes do not represent graphemes, but
characters in the sense of Unicode?
You are wrong, even today, many Teletext systems still broadcasted on TV programs use a encoding of diacritics
separated from the base character (in the details, those where even encoded with sequences of characters, when
transported over 7 bit streams, so that there was a good compatibility with ISO 646 or the IRV).
Look also at the various Indian charsets (ISCII standard): viramas and so on.
Look also at Hebrew and Arabic encodings : vowel points, consonnant modifiers, cantillation marks and so on.
The need for separate encoding of diacritics has existed since very long, notably when charsets were limited to
using code units not longer than a single 7-bit or 8-bit byte.
Even the oldest Japanese JIS standard used a combining character (for the voiced/unvoiced consonnantal modifier
within its two syllabaries); the same was true for the oldest KSC standard for Korean Jamos (before they could be
precomposed into syllables), and this remains as well in the most recent KSC standard (which uses variable-length
multibyte sequences, made of codes belonging to multiple "parallel" sub-codepages.
The Greek ELOT standard also used separate encoding of the many diacritics needed for writing Polytonic Greek.
Various proprietary encodings used in printer languages also have defined their own encodings for separate non-
spacing diacritics (including for the Latin script). It was a natural evolution of the oldest sequences using
BACKSPACE to be compatible with 646 in a restricted 7 bit environment: extending them to 8 bit did not remove these
diacritics even if they could be encoded in a simpler way and if more characters could be encoded was a single code
unit.
The time where the ISO 8859 standards were developed and then widely adopted for Latin languages has been quite
short in the computing history (they still remain, but there's no longer any development on them, and these
encodings are facing out now rapidly in favor of Unicode/ISO/IEC 10646 "UTF"'s).
This archive was generated by hypermail 2.1.5 : Thu Feb 25 2010 - 08:56:58 CST