RE: Version 2.0 and Devanagari

From: John Clews (John@sesame.demon.co.uk)
Date: Wed Mar 27 1996 - 10:07:37 EST


There have been a series of emails on the list RE: Version 2.0 and Devanagari.
Michel Suignard's very useful reply to Michael Everson is inaccurate in some
areas - probably due to the passage of time since Antalya - and requires some
correction. This is a personal view, but I feel if may provide some useful
information to members of ISO/IEC JTC1/SC2/WG2 and also the Unicode
Consortium.

As it is a long email I have divided it into the following sections:
 1. JTC1/SC2/WG2, the Unicode Consortium, and outdated Indian "standards"
 2. Possible implementation problems with the current Tables 16-24
 3. What are India's requirements?
 4. Participation by India and other developing countries in JTC1/SC2/WG2
 5. Could there be a Korean pDAM situation for Indian scripts?

I have also labelled quotes (I hope correctly) as MS for Michel Suignard, and
ME for Michael Everson.

1. JTC1/SC2/WG2, the Unicode Consortium, and outdated Indian "standards"

In message <v0153050ead7c9b1ff413@[199.186.52.218]> Michael Everson writes:

ME> Date: Mon, 25 Mar 1996 13:37:21 -0500
ME> Reply-To: iso10646@listproc.hcf.jhu.edu
ME> Sender: owner-iso10646@listproc.hcf.jhu.edu
ME> Precedence: bulk
ME> From: everson@indigo.ie (Michael Everson)
ME> To: unicode@unicode.org
ME> Cc: iso10646@listproc.hcf.jhu.edu
ME> Subject: RE: Version 2.0 and Devanagari
ME> Mime-Version: 1.0
ME> X-Sender: everson@mail.indigo.ie

At 10:22 1996-03-25, Michel Suignard wrote:

MS >Michael, I don't think your message reflects exactly the situation. In
MS >the WG2 meeting in Antalya, Turkey, a representative from India came
MS >with the new ISCII standard.

This was not new at that point. The Indian Standard - Indian Script Code for
Information Interchange - ISCII (IS 13194 : 1991) had been available for some
considerable time. Both ISO/IEC JTC1/SC2/WG2 and the Unicode Consortium
should have had a chance to confirm the latest state of standardisation of
ISCII well before that date, and in fact had been advised via at least two
routes of this Indian standard:

(a) the UK delegate at a meeting previous to Antalya had, as I understand it,
    informed ISO/IEC JTC1/SC2/WG2 of the existence of the new standard,
    having been informed of it at a previous meeting of BSI/IST/2 by me;
(b) The Director of CDAC (Dr. Vijay P. Bhatkar) had sent a copy of the Indian
    standard IS 13194 to Kenneth W. Whistler of Metaphor, with a covering
    letter (which I have seen) stating that CDAC (Pune, India) was intimately
    involved with character coding standardisation within the Bureau of
    Indian Standards. I do not have the date for this but again this was, I
    think, during 1991, and no later than early 1992.

Any serious discussion of (a) at ISO/IEC JTC1/SC2/WG2 was - perhaps
understandably - eclipsed by the "Unicode vs. previous ISO/DIS 10646" debate
on the structure, and avoiding the possibility of and ISO/IEC standard and a
competing de facto Unicode standard. No discussion of (b) was undertaken by
the Unicode Consortium: the only reply to this seems to have been a standard
letter from a secretary in the Unicode Consortium urging the recipient to buy
the Unicode standard.

Having had considerable discussion with N. Subramanian, the Indian delegate,
before and at Antalya, I am sure that Michel Suignard is wrong to state that
MS >He was not asking to move anything. He was
MS >asking for a new row to put the latest ISCII layout.

N. Subramaniam WAS asking that the current tables for Devanagari and other
Indian scripts derived from Brahmi, should be replaced by a direct mapping
from ISCII (IS 13194 : 1991). That is, tables 16-24 should map to the current
version of IS 13194 : 1991. That is, an analogous situation to the Korean
pDAM was being requested.

It is clear from Annex L of ISO/IEC 10646-1: 1993 that the ISCII referred to
is NOT an Indian standard. Indian standards are prefixed IS (as in IS 13194)
and the LTD 37(1610)-1988 is in fact merely an internal document number of
the Department of Electronics of the Government of India. The filing order of
LTD 37(1610)-1988 within the list of other standards makes it clear that
IS nnnnn was expected here.

2. Possible implementation problems with the current Tables 16-24

If the current Table 16 is retained, all implementors will need to know of
specific differences between Table 16 and IS 13194 : 1991. Not all characters
that are present in Table 16 are represented by one byte in IS 13194 : 1991 -
some of these require a form of variable byte coding.

Virtually all software for Indian languages now uses ISCII (IS 13194 : 1991).
Although this is a subset of Table 16 in ISO/IEC 10646-1, when converting
between the two coding standards mentioned there may be complications.

Much of these could have been avoided if both ISO/IEC JTC1/SC2/WG2 and the
Unicode Consortium had checked, or listened to advice emanating from India.
Most external data and text in interchange and processing will emanate from
India, particularly with the renewed interest in religious and cultural
information from India.

Despite the rapid growth of the Indian computer industry (including Western
offshore computer development in places like Bangalore) there is going to be
little early impetus for Indian users to migrate to multiple-octet coding
given that non-trivial conversion is likely when it could have been arranged
that a trivial algorithm could have been available through incorporating the
actual Indian standard.

Those companies considering using Tables 16 through 24 of ISO/IEC 10646-1
should ask themselves what the cost of transformation is going to be,
particularly in the light of the discussion quoted below.

3. What are India's requirements?

MS >The ISCII standard
MS >(my copy is dated December 1991 with a January 1993 reprint) aims at
MS >encoding "a common alphabet for all the Indian scripts [] made possible
MS >by their common origin from the same ancient Brahmi script." (extract
MS >from the standard).

To clarify the issue here: the Indian intention was NEVER to add a further
row for Brahmi script, but to ensure that all scripts currently used in India
were mapped to the coding in IS 13194 : 1991.

Readers of this list should be clear that Devanagari maps to all other script
used in India in the same way that Latin script maps to Gaelic script and to
Gothic script. From a coding implementors point of view, you can treat each
of these scripts (deriving from Latin, and deriving from Brahmi) as fonts of
the base script. Obviously the users will regard these as scripts rather than
fonts, and require appropriate locales to be considered in applications.

ME> Ah. I was not in Antalya, unfortunately. I do have SC2/WG2 paper N1030 (by
ME> John Clews and N. Subramanian) which discusses this. It was on the actions
ME> list apparently since Antalya but in the minutes of the Tokyo meeting it
ME> states that Action Item 25-8 on the Indian NB to prepare a defect report
ME> was dropped because no one took any action.

MS >So clearly this is very different from what is already coded in Unicode.

ME> Yes, it is.

MS >If we see a bit more often our Indian collegues attending WG2 meetings
MS >(nobody from India has come to a WG2 meetig since then) and when (more?)
MS >commercial applications are developped for the Indian natives languages
MS >I would assume that the issue will have to be revisited. I wouldn't be
MS >surprised that at that some time we have to create a new row for
MS >'Brahmi'. And it may lead eventually to the deprecation of the other
MS >Indian script encoding.

ME> N1030 proposes this, BEFORE publication of the next version of 10646. I do
ME> not in principle have anything against this. Though if those other Indic
ME> rows were deprecated I would want the code positions reused for other
ME> standardization. and I would like to see something done about this soon by
ME> somebody. This would be another Hangul-like pDAM. If it were to be done
ME> (and it could be done with relative ease) it ought to be done quickly. Like
ME> proposed at the _next_ WG2 meeting. Would Unicode file a defect report? It
ME> would make the committees (WG2 and UTC) discuss the matter. Of course the
ME> disposition of the defect report might be Do Nothing.

4. Participation by India and other developing countries in JTC1/SC2/WG2

Michel Suignard makes some valid criticisms of the lack of Indian
participation after Antalya. In particular (again quoting from the above)
where Michael Everson states
ME> I do have SC2/WG2 paper N1030 (by
ME> John Clews and N. Subramanian) which discusses this. It was on the actions
ME> list apparently since Antalya but in the minutes of the Tokyo meeting it
ME> states that Action Item 25-8 on the Indian NB to prepare a defect report
ME> was dropped because no one took any action.

India took no action, but this might be because they compared the rapid way
that ISO/IEC JTC1/SC2/WG2 N 1030 was dealt with, with the lengthy atention
given to the delegates from China, dealing with various scripts, at Antalya,
coding for which scripts has still not reached a great deal of
standardisation since then.

Having said that, I and others in the UK have found that relevant parties
have sometimes been very slow to respond to queries, and could have been much
more prompt.

Clearly, India should have taken a more active role in standardisation within
ISO/IEC JTC1/SC2. This might be helped if they were informed by ISO/IEC
JTC1/SC2/WG2 of the availability of help from within ISO itself. ISO's DEVCO
and DEVPRO activities are designed specifically to assist participation in
ISO/IEC standardisation from developing countries, but information
dissemination is woefully inadequate.

I only came across ISO's DEVCO and DEVPRO activities when dealing with
ISO/TC46/SC2/WG10 (Transliteration of Mongolian) - as chair of ISO/TC46/SC2
I have just become aware of their existence, and MNISM (Mongolia) is making
use of DEVCO/DEVPRO. It would be useful for ISO/IEC JTC1/SC2/WG2 to inform
relevant bodies in Asia of the availability of this, as ISO DEVCO and DEVPRO
seem to have fallen down on this.

One relevant contact that ISO/IEC JTC1/SC2/WG2 should therefore make is with
Mr. Anwar El-Tawil, Director of ISO/DEVCO at ISO Central Secretariat.

Again, this may be something that the Unicode Consortium could assist in -
identifying relevant individuals from developing countries in Asia and
assisting them with air fares, accomodation etc.

5. Could there be a Korean pDAM situation for Indian scripts?

I have raised this with a number of individuals, and received replies giving
good reasons why this should not be so. However, other people also seem to be
raising it, so it may not go away that quickly.

Again to quote from the discussion above:
MS >I would assume that the issue will have to be revisited. I wouldn't be
MS >surprised that at that some time we have to create a new row for
MS >'Brahmi'. And it may lead eventually to the deprecation of the other
MS >Indian script encoding.

ME> N1030 proposes this, BEFORE publication of the next version of 10646. I do
ME> not in principle have anything against this. Though if those other Indic
ME> rows were deprecated I would want the code positions reused for other
ME> standardization. and I would like to see something done about this soon by
ME> somebody. This would be another Hangul-like pDAM. If it were to be done
ME> (and it could be done with relative ease) it ought to be done quickly. Like
ME> proposed at the _next_ WG2 meeting. Would Unicode file a defect report? It
ME> would make the committees (WG2 and UTC) discuss the matter. Of course the
ME> disposition of the defect report might be Do Nothing.

It may well be useful for a separate ad hoc group to be set up by
ISO/IEC JTC1/SC2/WG2 and/or the Unicode Consortium to consider the effects on
implementors of the current situation. I would be prepared to participate in
such a group, and it would also be useful to involve India more actively,
through trying to revive the existing contact with Mr. N. Subramanian.

                                Yours sincerely

                                  John Clews

--
John Clews (Chair ISO/TC46/SC2 & BSI/IDT/2/5: Conversion of Written Langauges)
SESAME Computer Projects, 8 Avenue Road       tel: +44 (0) 1423 888 432
Harrogate, HG2 7PG, United Kingdom            email: john@sesame.demon.co.uk



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:30 EDT