Indic working group

Last updated: September 11, 2004

1.  Open problems and action items
2.  Closed problems
3.  TUS 4.0 and other Unicode publications
4.  External documents
 
4.1.  TDIL documents
4.2.  Kerala IT Mission
4.3.  Books and articles
5.  indic at unicode.org mailing list
6.  Examples of proposals
7.  Possible outcomes for a problem
8.  Timeline
Document History

This is the home page for the Indic working group.

The main goal of the current activitiy is to make sure that Unicode 5.0 is as complete as possible for the major Indic scripts and languages:

We should use 4.1 as our earliest opportunity to bridge the gap until the 5.0 release. The textual content should be limited to critical parts. If we decide that new characters are needed, 4.1 should provide temporary solutions as well as point to the future solutions.

1. Open problems and action items

v
P1: Script specific danda and double danda
AI 1.1 everyone Make sure that your point of view is reflected in the analysis
AI 1.2 ? Provide a rebutal for argument F1
AI 1.3 ? Provide a rebutal for argument A1
AI 1.4 ? Provide examples of use of the danda and double danda in the various languages in actual documents; this will be needed to go down the path of separate encoding; and it will shed light on the variation across languages
P2: Script specific Udatta and Anudatta
AI 2.1 ? determine if this is part of problem P30
P3: Grave and acute
P4: Invisible letter
P5: Devanagari conjuncts
AI2.1 ? Make sure that the sequences for Devanagari conjuncts in NamedCompositeEntities.txt are accurate and complete.
P6: Devanagari SHA and LA for Marathi
P7: Sindhi implosives
A7.1 ? collect evidence of the Sindhi implosives in actual documents. v
A7.2 ? search for actual documents which contain both Sindhi and Sanskrit text
P8: Marathi eyelash RA
P9: Devanagari currency sign
P10: Devanagari signs for Sanskrit
P11: Assamese letters sort order
A11.1 TDIL confirm that the motivation for the request is reencoding is collation
P13: Bengali Ya-Phallaa
P14: Bengali KHYA
P15: Gurmukhi post-base/subjoined forms
A15.1 TDIL clarify the problem
P16: Move of U+0A71 GURMUHKI ADDAK
A16.1 TDIL clarify the problem
P17: Move of GURMUKHI EK ONKAR and ADI SHAKTI
A17.1 TDIL clarify the problem
P18: Encoding of Gurmukhi nasalized vowels
A18.1 TDIL clarify the problem
P19: Gujarati abbreviation sign
P20: Gujarati fractions
P21: Oriya vocalic RR
P22: Telugu nukta
P23: Telugu Avagraha
P24: Kannada vowel sign a
P25: Kannada reph
P26: Kannada Deergha Swaritha
P27: Support for Tulu and Kodava
P28: Malayalam Chillus
P29: Malayalam DIGIT ZERO and fractions
A29.1 ? research the digits and fractions in Malayalam, gathering evidence
P30: Vedic
P31: Musical notations

2. Closed problems

P12: Bengali Khanda Ta
Accepted for Unicode 4.1

3. TUS 4.0 and other Unicode publications

4. External documents

4.1. TDIL documents

The TDIL documents which describe the Indic scripts, including the proposed additions, can be found at http://tdil.mit.gov.in/news.htm. The January 2002 issue covers Devanagari and the following issues cover the other scripts.

The page http://tdil.mit.gov.in/pchangeuni.htm points to modified code charts as well.

4.2. Kerala IT Mission

The site of the Kerala IT Mission contains some documents related to character encoding at http://www.keralaitmission.org/malayalam/malayalam_keybo.htm.

4.3. Books and articles

We have a (very partial) bibliography of books and articles. Since many of these books are very hard to find, a source is identified if possible; we also have scan of relevant pages.

5. indic at unicode.org mailing list

The mailing list indic at unicode.org is used by this group to discuss the problems.

To subscribe, send a message to ecartis at unicode.org, with “subscribe indic” as the subject. Be sure to send messages from the address you subsribed.

The archive of the list is available as a raw mbox file, at http://www.unicode.org/~ecartis/indic/. To get access, you need to use user-id “unicode-ml” and password “unicode”.

6. Examples of proposals

Over the years, the UTC and WG2 have refined their method of work. At this point, proposals for new characters are rather sophisticated documents. Here is a representative example:

L2/04-025: Proposal to encode 5 new Arabic script characters, by Jonathan Kew [3.5 Mbytes]

(Thanks for Jonathan for providing this document here.)

Note that the proposed characters are shown in “real life” examples. This gives a chance to the UTC and WG2 members to validate the arguments in favor of encoding, and sometimes leads to observations that escaped the proposer.

7. Possible outcomes for a problem

We have a number of tools available to resolve a problem:

8. Timeline

We have to take into account the timing of the Unicode versions, and the various meetings leading to them.

The next UTC meetings are:

The Editorial Committee, whose function is to implement the decisions of the UTC in the form of text for the standard and the actual UCD content, meets about once a month.

The next release, 4.1, is currently planed for the first half of 2005. Its character repertoire is now essentially frozen, because of the synchronization with ISO 10646:2003 Amendment 1. The November meeting is the preferred target for technical proposals for 4.1 (other than new characters), as well as for draft text to be incorporated in the standard.

The release after that, 5.0, is planed for 6 to 12 months after 4.1, and will be synchronized with ISO 10646:2003 Amendment 2. New characters can be added to 5.0; in practice, it would be much simpler if solid character proposals are submitted by the November UTC meeting. Text for that version should be submitted to the Editorial Committee in the first half of 2005.

In both cases, text submitted to the Editorial Committee is likely to go through a number of revisions before becoming the published text, and this group will of course be involved in those revisions. The dates above are for solid first drafts.


Document History

RevisionDateComments
2September 11, 2004

Added link to Kerala IT Mission