From: Behnam (behnam.rassi@gmail.com)
Date: Wed Aug 30 2006 - 17:32:51 CDT
I want to see the implication of what you are suggesting in actual
typing. And also the level of control the user has over the shaping
of 'initial', 'medial', and 'final' forms.
Right now, I don't miss anything in my slightly customized Persian
keyboard. I have 'Heh' on the front and it does produce the
contextual shaping that I desire (standard U+0647 with oval isolated
form) and occasionally, if I desire to use medial Heh Goal, (U+06C1)
I hit the same Heh key with shift modifier. The only thing I'm
missing is 'letter heh abbreviated' (which is wrongly presented as
letter heh in Unicode) if Unicode provided me with a code for this
shape (which is a non joining character and not U+06BE) I wouldn't
have had anything missing.
I guess in most languages the keyboard can be configured in a way to
produce the intended character with the least amount of effort. (as I
suppose that's what Kurdish users are actually doing right now) What
is really needed is codes which allow us to avoid ZWJ.
The problem is, that I'm not Unicode compliant! I'm not supposed to
use Heh goal, although I wrote my name with it since I was 3! I'm
supposed to type it with U+0647 and hope for a font maker to put
medial heh goal for that code in contextual shaping. And then, if I
find a font like that, then I should forget about having a regular
medial heh in the next word I'm typing!
The three general behaviors of Arabic script characters, non joiner,
right joiner and double joiner made Arabic typing possible. The user
doesn't have to hit a command key after each letter to tell it how to
behave. My only problem with current situation is lack of heh do
chashme non joiner (abbreviated form), and restrictive language
definitions of codes.
Behnam
On 30-Aug-06, at 11:53 AM, Philippe Verdy wrote:
> From: "Behnam" <behnam.rassi@gmail.com>
>> The point I want to make is, in searching an answer for your
>> question as
>> 'what is Kurdish heh', one should be certain that the shapes of
>> initial, medial and final forms are not just a matter of optional
>> taste, but irrevocable rules.
>> If this is clarified, then yes, I agree with you that Kurdish heh
>> requires its own code.
>
> If Kurdish has a clear orthographic distinction between E and H,
> then this is not a calligraphic choice, and there's no way one can
> mix the 4 forms of the Arabic Heh that could break the distinction
> between E and H.
> If one must encode separately the letter for Kurdish H (which will
> use only two forms of the Heh), one must also encode Kurdish E so
> that it will never collide with H due to calligraphic conventions.
>
> So this looks like both letters must be made clearly distinct,
> independently of the font used. This can be done in several ways,
> but may be the cleanest way would be by adding some combining
> character to qualify the letter when this is known to create
> collisions with Arabic usage.
>
> But then there's the case of Urdu and Uighur. How many letters will
> we need?
>
> Why not encoding new format controls that override the joining type
> of any letter encoded before it, and only this letter
> (independently of its left or right context). ZWNJ and ZWJ do not
> correctly play this role because it affects the joining behavior of
> letters on its both sides. What is needed is:
>
> * a ZERO-WIDTH BEFORE-JOINER (that forces the previous RTL
> character to adopt a right-joining form)
> * a ZERO-WIDTH NON BEFORE-JOINER (that forces the previous RTL
> character to adopt a right-disjoining form)
> * a ZERO-WIDTH AFTER-JOINER (that forces the next RTL character to
> adopt a left-joining form)
> * a ZERO-WIDTH NON AFTER-JOINER (that forces the next RTL character
> to adopt a left-disjoining form)
>
> And then integrate them in the BiDi and joining rendering rules.
> Depending in situation, we would encode them after the letter (one
> of the two first controls), or before the letter (one of the two
> last controls), and so we would completely control the joining type
> for renderers (note that this could be integrated in the renderer
> itself, without needing to change/upgrade existing fonts, or in the
> fonts themselves if the renderer is not changed) and the encoded
> pairs would give a clear semantic as well to make the necessary
> distinctions.
>
> In this case, only U+0647 is needed, and we don't need to look for
> new code points for specific letters, and it becomes possible to
> assign keystrokes directly to these pairs for Kurdish, Urdu,
> Uighur, ... who knows.
>
This archive was generated by hypermail 2.1.5 : Wed Aug 30 2006 - 17:50:00 CDT