Tuesday, January 14, 1997
10646ers and Unicoders,
In gereral I agree with Ken Whistler's statement. ISO standards
are international standards and must respect legacy national standards.
Part of the problem is that national boundaries and language boundaries
frequently do not coincide: some countries use several languages and
some languages are used in several countries, not to mention different
adaptations of a single writing system.
Having attented (as an observer) two meetings of the group that
authored ISO 109646 and one meeting of the group that wote ISCII (in 1982)
I have a comment. I support 10646 and Unicode though they are not
without flaws. In my humble opinion where 10646 differs from ISCII
(the 1991 version) it shows insufficient "sensitivity to the legacy
issues" (Ken's phrase) involved with development of ISCII. Had ISCII
(1991) been accepted as the Thai standard was, communication would
be easier. ISO 10646 is being implemented including the Indian
scripts so it is probably too late to change; I do not advocate
doing so.
What follows are some brief incidental comments on the recent
correspondence.
The January 1989 version of 10646 did include over 400
Devanagari ligatures (conjunct consonants and less predictable
consonant+vowel combinations). It didn't get enough votes, probably
for other reasons.
It might be worth considering another level in 10646 that
excluded all presentation forms and letter+diacritic combinations.
I see nothing wrong with having half consonants on a keyboard
so long as they are stored as full consonat+halant--prefereably
internally but certainly for external communication.
There are two approaches to encoding Indic scripts: typographic
(or just graphic) and phonetic. Both ISCII and 10646 adopted
the latter. The phonetic approach dates back at least to March 1978.
It works because in general Indic scripts are phonetic. When Indic
scripts are not is has ssome difficulties, e.g., the "eyelash R"
in Marathi use of Devanagari.
I tend to favor exclusion of presentation forms when the
separate encoding of their component letters results in renderings
acceptable across all languages that use a script. I suspect
Sindhi in Arabic script has some glyphs that should be separately
encoded (assigned codes of their own) because coding of them
with exisitng codes would result in glyphs unacceptable to users
of Arabic script for other languages, I refer to the hamza over
tow alephs and the isolated mim over two alephs.
I assume 10646 is supposed to be capable of generating acceptable
text without external clues (language codes) since none exist.
All the above are purely personal opinions, not necessarily the
official views of the Library of Congress or any agench of any
government.
Regards,
Jim Agenbroad ( jage@LOC.gov )
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:33 EDT