Inherent "a"

From: Maurice Bauhahn (bauhahnm@clara.net)
Date: Sat Mar 30 2002 - 10:21:54 EST


Why do you need to have a code for 'inherent a' in Tamil?

There is some imprecision concerning what constitutes an 'inherent' vowel.
In this note I am referring to normally unwritten vowels that are
nevertheless pronounced.

I know nothing about Tamil, but in Khmer Unicode there are two such inherent
"a" characters. A long inherent (native Khmer language) at U+17B5 and a
short inherent (Sanskrit/Pali) at U+17B4. Their encoding has raised some
outcry (in fact some parties are trying to deprecate them), but the more I
analyse grammars, dictionaries, and round-trip transliteration the more
importance they assume.

(1) If you look at a dependent vowel series in an Indic script...they often
start with an unwritten 'inherent a' character, recognising their unique
existence.

(2) If you transliterate between an Indic script and a Latin [or other
phonetic] transliteration, the inherent vowel must become explicit in the
transliteration (hence it would be extremely useful for round-trip
conversion reasons to have a code in the Indic encoding to match that).
Dependable round trip conversion of text is becoming increasingly important
when a single minority language spans national borders where government
authorities on opposite sides of the boarder insist the 'national' script of
their respective country be used to render that language.

(3) Not every consonant cluster that lacks an explicit dependent vowel also
contains an 'inherent a' (in particular in Khmer it is unpredictable from
the context [i.e., without a lookup] whether a final consonant cluster
without a dependent vowel has a pronounced inherent or not).

(4) Non-final clusters lacking an explicit dependent vowel 'always' (a
dangerous word to use!) have an 'inherent a', possibly short or long.

(5) Depending on the foreignness of the word an 'inherent a' in Khmer may be
short (foreign) or long (Khmer language)

(6) Dictionaries have to make the short 'inherent a' vowel explicit in their
pronouncing sections (usually borrowing U+17C8 to display it; however you
would not want to raise ambiguity by using that code both when it is
normally displayed and when it is there for making pronunciation clear)

(7) For phonetic rendering of an Indic script, therefore, it would be very
useful to selectively encode it. In the future data input and output will
increasingly move to verbal/aural, rather than keyboard means. This would be
quite an exciting development for Khmer...because Khmer is difficult to
keyboard and presumably relatively easy for a computer to recognise (what
with about fifty vowel/vowel-sign combinations that are easier for computers
to recognise than consonants). Hence, I would assume that codes to capture
verbal data converted to Unicode text will similarly become increasingly
important.

(8) 'Inherent a' is often used in combination with vowel-like signs such as
U+17C6 NIKAHIT, U+17C7 REAHMUK, U+17C8 YUUKALEAPINTU to generate vowels with
consonantal final sounds. Failure to recognised the 'inherent a' results in
wrongly interpreting those consonant-like signs as vowels. These vowel+sign
ligatures are in fact treated like unique vowels in sorting.

There are arguments against using 'inherent' vowels.

(a) Unwritten characters tend to not be typed! And if they were, the data
stream length would grow remarkably.
(b) Binary comparison of words with and words without 'inherent' vowels
would be problematic
(c) The average user would probably not gain advantage from the inclusion of
'inherent' vowels in the text stream
(d) I could not find more than one instance in the authoritative Chuon Nath
Khmer dictionary where two words otherwise spelled the same were
distinguished by the length of their inherent vowels. It is hard to write a
sorting rule on one data point;-)
(e) Rendering mechanisms may not recognise the (rarely used) inherent code
and cause problems when it is used.

Hence, it would be preferred that the use of inherent vowels be sharply
circumscribed...but not eliminated altogether.

In summary, inherent vowels:

(1) Are characters in their own right
(2) Are needed for round trip script conversion (transliteration)
(3) Are not a trivial case: They are not contained in every consonant
cluster even when that cluster does not contain a visual dependent vowel
(4) Are useful for preserving phonetic value in dictionaries or
text-to-speech applications

Interested,

Maurice Bauhahn

-----Original Message-----
From:
Sent: 29 March 2002 19:38
Subject: Inherant "a"

I need to allocate a U+codepoint for inherent "a", to be used for Tamil
research. Can anyone suggest a temporary location or is it possible to find
such code point within the existing code point for Tamil.



This archive was generated by hypermail 2.1.2 : Sat Mar 30 2002 - 11:20:01 EST