Re: Tamil glyphs

From: Antoine Leca (Antoine.Leca@renault.fr)
Date: Thu Sep 07 2000 - 13:00:12 EDT


Michael Kaplan wrote:
>
> To answer a question someone else posed, a ZWNBSP or a ZWJ will not work
> here since the vowel reordering must happen, as well. They are two entirely
> different but entirely valid forms of the same groups of letters.

Agreed.

 
> I guess one could claim that the problem is with the current block
> description, which as far as I know in its description is intended to be
> normative in regards to how Tamil is handled. It does not even suggest
> another possiblity, and specifically says (replacing Tamil glyphs with
> letter symbols since many do not have the fonts and many others may not like
> the UTF-8):
>
> The vowel sign AI changes to <> to the left of NNA, NNNA, LA, or LLA.
>
> <examples>
>
> Remember that this change takes place after the vowel reordering; in the
> first example, the vowel AI follows NNA in the memory representation. After
> voewl reordering, it is on the left of NNA, and thus changes form. The
> complete process is
>
> <example>

I agree this description should explain this is an *optional* change.
For example, the Tamil Nadu Standard for font encoding have *no* room for
these forms... See <URL:http://www.tamilnet99.org/encstd.htm>

 
> Now, the argument that this is *just* a font issue is really not one that I
> can accept very easily, especially since there is apparently even some
> modern usage of the other form

Sorry, what is "the other form"? As I see things, in Tamil Nadu the current
use is write NNAI exactly the same as, for example, KAI (that is, without
the "elephant-trunk" form that TUS appears to require).

 
> Now I am (slowly) learning the language but it will be some time before I
> can fully grasp this issue, but I do not know of other examples where
> something such as a ligature clearly described in a block description is
> supposed to be selectively ignored.

I happen to know the reverse: a ligature, described as such and which is
meaningful, that is dropped from use (for technical reasons) and replaced
with its components: that's (again!) the French oe.

Also, let's take the case of Latin, with the s before consonant. My understanding
of the issue (which is very poor, certainly I shall write uncorrect things here)
is that this letter changed form and then adopted the so-called long s form.
This was, at least in 19th c. usage, selectively done, isn't it?

The point here is that Unicode do encode long s as a separate character!

 
> Even if there were no modern usage, I would be resistant to suggest that it
> is proper design to require two different fonts for a language that barely
> has fonts out there at all, to support a usage that is not described in the
> standard.

I should highlight that TUS describe an usage that current fonts do not support.
Also, this non-standard usage is actually the non-application of two rules
(the one you describe above, and the rule to form the special forms for .naa, _naa
and .raa), which are actually formed by two separate components; as a result,
a font which do support the full Standard, do have to include six (or 4) more
glyphs, and at least six more combination rules, in addition to what is already
in place for the basic rendering (i.e. reordering, the ligatures for -u, -uu,
ti and tii and the k.sa "consonant").

OK, leaving this to fonts is ugly. But what are the options? As you noted,
use of ZWS's (zero-width something) is not really in line with their current
use. So I see two main possibilities:
1) creating a new ZWS or combining, meaning "alternative way of rendering"
  (more exactly here, "do not ligate when optional")
2) adding two new characters, for -aa and -ai, that will *not* have the properties
  described in the Unicode Standard to date, so that will fit with current use
  in Tamil Nadu.

1. is better for processing (since any Unicode tool will be required to deal with
the ZWS's anyway).
2. is easier to deal with in the font renderers.

Either solution is ugly in my eyes, particularly when we consider that the script
"barely has fonts out there at all"...

BTW, Tamil people are actually proposing a much more radical solution: dropping
the present way and rather encode the syllabes. See the proceedings of the
conference held in July in Singapore <URL:http://www.tamilinaiyam2000.org/>;
however, since representants of the Tamil Nadu state are now members of the
Unicode consortium, I believe that unicore is rather the list where these
things are discussed (can anyone confirm/infirm?)

Antoine



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT