Accumulated Feedback on PRI #250

This page is a compilation of formal public feedback received so far. See Feedback for further information on this issue, how to discuss it, and how to provide feedback.

Date/Time: Tue Mar 5 22:17:41 CST 2013
Contact: john@tiro.ca
Name: John Hudson
Report Type: Public Review Issue
Opt Subject: PRI 250 – Malayalam conjuncts


While I think it might be useful to have a mechanism to signal the
desire for particular conjunct forms in plain text, using control
characters in some manner as suggested, it should be noted that the
OpenType 'language system' tag mechanism already makes possible the
creation of fonts that support both traditional and reformed Malayalam
orthography, without the need of ZWJ or ZWNJ. The OpenType Layout
language system tag registry contains two separate Malayalam related
tags (in addition to the {dflt} tag, which different fonts may
interpret differently):

Malayalam Traditional 	MAL
Malayalam Reformed 	MLR

http://www.microsoft.com/typography/otspec/languagetags.htm

This mechanism relies on appropriately built fonts (as does the
proposed ZWJ/ZWNJ mechanism, of course), and on software providing for
orthographic tagging of text. It should be noted that CSS3 font
support specifically enables this kind of tagging and font layout
behaviour. The advantage of this mechanism, of course, is that a
single font can be used to display either orthography simply by the
user appropriately tagging the text, rather than having to insert
control characters in sequences wherever a particular orthographic
conjunct form is desired.

That said, I favour the addition of an explicit ligature request/block
mechanism using control characters, as this will no doubt be as useful
for Malayalam as it has proven for other scripts. I see such a
mechanism as a secondary means to override, at the cluster level, the
results of higher level mechanisms such as OpenType Layout language
system tagging.

Date/Time: Wed Mar 6 00:59:23 CST 2013
Contact: verdy_p@wanadoo.fr
Name: Philippe Verdy
Report Type: Public Review Issue
Opt Subject: PR250: Optional Conjuncts in Malayalam


Could it be possible to encode instead a "modern virama" to facilitate
input, i.e. a virama that would always be visible and never part of
any conjunct, and that would allow the use of the same fonts for both
orthographies, the normal virama defaulting to the creation of
conjuncts, while the other defaulting to the separation, both viramas
not needing the use of any ZWJ/ZWNJ ?

It would mean that sequences:
- <traditional virama, ZWNJ> would be deprecated in favor of <new virama>
- <traditional virama, ZWJ> would be deprecated in favor of just
<tradtional virama>
- <new virama, ZWNJ> would not be needed and treated like <new virama>
- <new virama, ZWNJ> would not be needed and treated like <tradictional virama>

Keyboards built for modern Malayalam would preferably map only the
<new virama> with a simple key, the <traditional virama> would be
accessible with some modifer key (AltGr?) if needed.
Or keyboards could have a new working mode between traditional and new
orthography, also by using a state mode key (similar to CapLock).
changing the way the typed virama key would be mapped. Possibly, by
typing the <virama> key twice, it could automatically swtich to the
other mode (meaning that the virama key would behave like a dead key,
and would not return somethinf to appication befor we type another
letter or the virama key a secnd time to switch input mode.

Note that the same technic used on keyboard mappings could also be
used to automatically generate ZWNJ in the modern orthography when you
type the <virama> key before another letter. In which case, defining a
<new virama> character would not be needed. In that case the <virama>
key would also be a dead key, and the <traditional virama> character
would not be outputed before you press another key:
- if you press the <virama> key a second time, it would switch mode
between trandtional and modern orthoghraphy
- if you press the BACKSPACE key, the dead key is canceled, and
nothing is outputed to the application
- if you press the SPACEBAR key, the alternate virama is emitted
without needing to change the virama input mode, i.e. a virama is
emitted without ZWNJ when currently working in the modern mode, but
with ZWNJ when currently working in traditional virama input mode
- if you press a letter key, the <traditional virama> character would
be emitted, followed automatically by ZWNJ only if working in modern
orthography mode, before outputing the letter character. In that case,
there would be no conjuncts displayed in modern orthography, but
traditional conjuncts would be enabled when entering text in the
traditional input mode.
- if you press any other function key, that function key is emitted
without change, but the current virama dead key state should be
canceled (many European keyboard drivers forget to cancel their dead
key state, this is a defect in my opinion, but it may still be
acceptable to not cancel this state in the Malalayam keyboard driver
too, that's why I wrote "should be" and not "must be").

There are certainly other similar variants to adjust input modes to
best match the new orthography with the easiest input mode. But in my
opinon having users to know when they need to enter ZWNJ and ZWJ is
not very friendly and not easy. Many things can be enhanced with
keyboard drivers to allow fast input but with correct encoding ouput
and the expected rendering, without having to depend also on specific
fonts (all Malayalam fonts should work with both orthographies, given
the appropriate text input).

In both approaches, the keyboard general layout would remain basically
the same, only thr behavior of the existing <virama> key would be
changed to work as a dead key. And both encoding options (using ZWNJ
automatically, or using a <new virama> character if it was encoded)
would be possible.

However, the intent of this review is certainly to discuss about the
practical consequences of the need to use ZWNJ everywhere in the
modern orthography. In my opinion, encoding a <new virama>, with the
same combining class as the existing <traditional virama>, and that
would collate nearly the same as it (withou the complications of
ZWJ/ZWNJ) would simplify things a lot :

- the traditional virama would be used now always without ZWJ/ZWNJ :
it would create conjuncts everywhere it is possible to create them
(with just a fallback using a visible virama when using simpler
fonts or renderers that can't process the ligatures)

- the new virama would be used now always without ZWJ/ZWNJ : it would
never create conjuncts (it should not be used with ZWJ to change
this : you should use the traditional virama instead)

- less characters to enter/edit : ZWJ/ZWNJ is an Unic-only artefact,
foreign to the Malayalam script itself. Two distinct viramas are
coherent with the expected use and modeling of the script understood
by users.

- for most uses in modern Malalayam, a simple keyboard layout would
need to map ONLY the <new virama>, producing no conjuncts. Those
layouts do not need to map keys for ZWJ/ZWNJ, and applications do
not need then to handle those sequences containing them.

- Caveat : fonts or renderers would need to be developed to support
the <new virama encoding> as an alternative to the <traditional
virama, ZWNJ> encoding. But this developement would be worth the
trouble as now they could manage and render correctly all texts
written in modern as well traditional orthographies.



Date/Time: Wed Mar 6 22:30:04 CST 2013
Contact: behdad@google.com
Name: Behdad Esfahbod
Report Type: Public Review Issue
Opt Subject: PRI250 Malayalam conjunct sequences


Case 1 & 2

The idea of a ZWJ before a VIRAMA pulling C2 to conjoin to C1 is prevalent in
other Indic scripts, Bengali for example.  To have ZWNJ cancel any such effect
is logical.  In fact, I believe fonts can already be made to work this way
with HarfBuzz.

Case 3 & 4

If one reads the Malayalam section of the Unicode Standard very carefully,
there is this two lines at the end of the "Special Cases Involving Ra"
section:

"The sequence <0D7B, 0D31> is rendered as {chillu followed by ra}, regardless
of the reading of that text. The sequence <0D7B, 0D4D, 0D31> is rendered as
{chillu-ra conjunct}.

So, to me it appears that a chillu followed by a virama encourages conjunct
formation whereas chillu not followed by virama ends the syllable and hence
does not form any conjuncts.  I don't see how that is different from what Case
3 & 4 in the proposal try to achieve.

Date/Time: Sat Apr 6 04:10:22 CDT 2013
Contact: jfkthame@gmail.com
Name: Jonathan Kew
Report Type: Public Review Issue
Opt Subject: PRI#250: Proposal to Specify Optional Conjuncts in Malayalam


While I can appreciate the desire to be able to access both
traditional and reformed renderings...
 
"However, there is a definite need for the ability in a reformed
orthography font to display the traditional full conjuncts on demand.
As of now there is no mechanism specified in the standard to suggest
a full conjunct of a cluster.

The reverse of the above scenario is also needed - a traditional
orthography font might want to display reformed orthography grapheme
clusters optionally."

...it is not clear that this is something that should be represented
at the level of plain-text encoding. The difference is a stylistic
one that would be better handled by having separate fonts, or by
having optional features that may be applied to select different
glyph combinations within the font, or by distinct language systems
that in turn have distinct collections of features.

From a user's point of view, attempting to control this level of
rendering via invisible control characters in the text stream would
be extremely cumbersome and difficult to use, especially as the
effects, if any, of those control characters will be dependent on the
particular font being used.

Even a careful user is likely to insert the controls only in contexts
where the particular font being used during data entry happens to
support a conjunct that the user wishes to override (in either
direction). But if the data is later viewed with a font that supports
a slightly different repertoire of conjuncts, the attempt to enforce
"traditional" or "reformed" style will fail, as the exact set of
character sequences that need such control may be different.

A solution that treats this as a style difference, encoded in the
styling and/or language attributes of rich-text data, will be much
more workable than a plain-text representation of all the possible
stylistic variations. With text runs marked appropriately as
"malayalam-traditional" or "malayalam-reformed", and font shaping
technology (such as OpenType) that is sensitive to this distinction,
the desired result will be achieved even across multiple fonts with
varying glyph repertoires.

I believe it would be a mistake to attempt to encode this stylistic
distinction as part of the plain-text Unicode data.


Date/Time: Tue Apr 9 08:21:02 CDT 2013
Contact: naa.ganesan@gmail.com
Name: Naga Ganesan
Report Type: Public Review Issue
Opt Subject: Proposal to Specify Optional Conjuncts in Malayalam


Tamil has one case of style difference in display like this. Right now, in the
web page display, and inside MS Office products such as MS Word, when ZWNJ is
used to mark the split case of u & uu vowel signs, a dotted circle appears.
That dotted circle should not be displayed for text portions in web page or
for printing from MS Word documents.

Here are Newspaper samples from Chennai, India that use the both styles,
"Tamil-traditional" and "Tamil-reformed".

February 2, 2004: Look at two paragraphs of the same text in Viduthalai
newspaper. In the top paragraph, u & uu vowel signs ligate, while at the
bottom they do not ligate.

http://2.bp.blogspot.com/-LJpjRbdtNbw/UWOHxitxm-I/AAAAAAAACew/5xjch91D96o/s1600/2004.jpg

3-August-2006. Note the u & uu vowel signs split avoiding ligation for u or uu
vowel-consonants in the paragraph at the bottom of the page,

http://2.bp.blogspot.com/-UO9MSh3z_Ns/UWOIzaxRODI/AAAAAAAACe8/Uj376-St1sM/s1600/Viduthalai_3-8-2006.jpg

Date/Time: Mon Apr 15 02:19:33 CDT 2013
Contact: pravin.d.s@gmail.com
Name: Pravin Satpute
Report Type: Public Review Issue
Opt Subject: PR250: Proposal to Specify Optional Conjuncts in Malayalam


For this PR 250 i think we again need to revisit difference between GLYPH and
CHARACTER

One character can take number of shapes/glyphs depending on calligraphic or
script requirement but it should not affect the storage.

Example

In Devanagari script character
U+0932 DEVANAGARI LETTER LA ल 
This characters has different representation in Marathi Language ल.

For this specific need it is handled in open type specification with <locl>
feature tag.

Same way there are number of example where single character take different 
form depending in script, language and calligraphic requirement. 

Now thinking proposed changes to Unicode on same line 

U+0D38 U+0D4D U+0D15
സ്ര – Meera fonts
സ്ര – Lohit Malayalam

In the above example though representation is different, 
the syllable is same. It will be pronounced in the same way 
and both are same syllables.

As per the proposal if we add ZWJ/ZWNJ to type specific type of 
representations it can create following problem.
1. NLP application: Need to handle both the things, even 
though they are same.
2. Backward compatibility: already enormous data created for Malayalam 
language, fixing it for new introduced storage way will be problematic.

Ideal Solution:
A. Handing in fonts

1. As already mentioned by John Hudson
 Malayalam Traditional   MAL
 Malayalam Reformed      MLR

http://www.microsoft.com/typography/otspec/languagetags.htm


2. Options to user for disabling and enabling particular gsub lookup in fonts. 
How it will help:
If user want ligature mostly used in Traditional script, he can enable that 
lookup. Else he will get only Reformed script output.