> Totally off the subject of recent topics, and for that matter UNICODE
> itself (sorry), but does anyone know what the USSR GOST 19768-87 so-called
> "New KOI-8" character set is/was? (As opposed to 19768-74 "Old KOI-8" which
> is widely used in netnews, etc.) Is it the internal national equivalent of
> ISO 8859-5 / ECMA-113, or is it something entirely different?
>
Answering my own question -- it's amazing what you can turn up by digging
through old piles of paper...
ECMA-113 was originally issued in 1974, the international version of GOST
19768-74, and this was what we now call "Old KOI-8", or perhaps more commonly,
simply KOI-8; as noted above, it is still in wide use. This character set is
notable in that the Cyrillic characters in the right half are arranged to
coincide with the Roman ones in the left "by sound", so if the 8th bit were
chopped off (as still happens to this day on many communications connections
and e-mail systems), or you did not have a a Cyrillic display, you could still
read the text more or less phonetically. (A similar trick is accomplished in
7 bits by "Short KOI", or KOI-7, in which lowercase Roman letters -- columns 6
and 7 -- are replaced by lowercase Cyrillic ones.)
In 1987, GOST standard 19768 was revised and reissued as GOST 19768-87, and it
coincides completely with ISO 8859-5, and (confusingly enough) also the 1987
revision of ECMA-113. This is the "New KOI-8" that "nobody uses" (is that
true?). The advantage of the revision is that (a) it follows ISO rules by
having ISO 646 IRV (i.e. US ASCII) in the left half (whereas KOI-8 replaced
the dollar sign by the international currency sign), and (b) by encoding extra
Cyrillic letters needed for Ukrainian, Belorussian, and other languages
besides just modern Russian, and (c) having the Russian letters (columns
11-14) in alphabetical order.
So yes, "New KOI-8" is simply ISO Latin/Cyrillic.
- Frank
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT