Re: [OT] multilingual support in MS products (was Re: Kurdish ghayn)

From: Doug Ewell (dewell@adelphia.net)
Date: Mon Apr 28 2003 - 02:39:59 EDT

Next message: Marco Cimarosti: "RE: Languages A-Z"

Previous message: Michael \(michka\) Kaplan: "Re: [OT] multilingual support in MS products (was Re: Kurdish ghayn)"
In reply to: John Hudson: "Re: [OT] multilingual support in MS products (was Re: Kurdish ghayn)"
Next in thread: Thomas Pottjegort: "RE: [OT] multilingual support in MS products (was Re: Kurdish ghayn)"
Reply: Thomas Pottjegort: "RE: [OT] multilingual support in MS products (was Re: Kurdish ghayn)"
Maybe reply: Doug Ewell: "Re: [OT] multilingual support in MS products (was Re: Kurdish ghayn)"
Maybe reply: Michael Everson: "Re: [OT] multilingual support in MS products (was Re: Kurdish ghayn)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

John Hudson <tiro at tiro dot com> wrote:

>> No Dutchman - whether he is involved in type or not - can be amazed
>> by the existence of IJ.
>
> No one is amazed that it exists as a grapheme, but my Dutch colleagues
> are frequently surprised to discover that it is a *character* in
> Unicode, and they wonder why. Perhaps this is one of those characters
> that needs its story told: I've heard that it was encoded for
> backwards compatibility with an existing standard, but no one I've
> asked seems to know which standard, or whether this standard is still
> in use by anyone.

The standard is ISO/IEC 6937.

First developed in the early 1980s, this was a supplementary set of 96
code points intended for use in conjunction with ISO 646 (ASCII) to
cover as many European languages as possible, within the ISO 2022
framework. It featured a set of non-spacing diacritical marks, the
forerunners of Unicode's combining marks, although they appeared before
the base letter instead of after it as in Unicode, and were not
considered characters in their own right. About 330 characters could be
encoded when all the combining marks were taken into account.

ISO 6937 had some significant drawbacks that prevented its widespread
deployment, at least in North America. The combining marks could only
be used in certain prescribed combinations (a with acute was legal but g
with acute was not), and only one combining mark per base letter was
allowed, which made ISO 6937 useless for languages like Vietnamese that
require multiple diacritics. Furthermore, because it lived in the ISO
2022 world, ISO 6937 had to be "announced" via an escape sequence. And
of course, there was the usual resistance to encoding a single character
like á with a two-byte sequence. ISO 6937 never achieved great
popularity, although I have heard it saw some use in the Netherlands.

The capital IJ digraph is encoded at position 6/6 in ISO 6937, which
means it would normally be expressed with byte 0xE6 (assuming 6937 was
defined as the G1 or "high-bit" character set). The small ij digraph is
encoded at 7/6 (0xF6).

-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/

Next message: Marco Cimarosti: "RE: Languages A-Z"
Previous message: Michael \(michka\) Kaplan: "Re: [OT] multilingual support in MS products (was Re: Kurdish ghayn)"
In reply to: John Hudson: "Re: [OT] multilingual support in MS products (was Re: Kurdish ghayn)"
Next in thread: Thomas Pottjegort: "RE: [OT] multilingual support in MS products (was Re: Kurdish ghayn)"
Reply: Thomas Pottjegort: "RE: [OT] multilingual support in MS products (was Re: Kurdish ghayn)"
Maybe reply: Doug Ewell: "Re: [OT] multilingual support in MS products (was Re: Kurdish ghayn)"
Maybe reply: Michael Everson: "Re: [OT] multilingual support in MS products (was Re: Kurdish ghayn)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Apr 28 2003 - 03:26:57 EDT