L2/00-026
These are 5 comments from Lloyd Anderson on ZWJ and ZWL in this document
Sent: Tuesday, January 25, 2000 12:03 PM
Subject: ZWJ - Consistent Contexts
Since Ken's discussion today does not deal with the following material,
and would almost lead a reader to think it did not exist, I repeat here
as a separate message my compilations of the contexts of use of ZWJ,
which together also assist in showing that the suggestion to completely
disregard ZWJ is an irregularity (an inconsistency) which leads to the
need for a countervailing unnecessarily elaborate sequence
ZWJ + ZWNJ + ZWJ
to get the results which a simple
ZWJ
would yield if the semantics are kept consistent with the basic statement
on Unicode 2.0 page 6-70 (the interpretation as "tiny letter"
still valid for Unicode 3.0 according to Ken).
***
4. ZWJ is needed in several scripts (Devanagari, Arabic, even Latin)
for its basic function as defined by Unicode 2.0 page 6-70.
(Thus answering Mark's request for information.)
Mark asked on January 14th:
>If anyone has any actual evidence of frequent usage of
>ZWJ between cursive characters,
>please let us know the details before the UTC meeting!
First, "frequent" is not an appropriate part of the question,
because by definition ZWJ and ZWNJ are used on
"occasions where an author may wish to override the normal
automatic selection of joining glyphs"
Since these are not intended for normal automatic situations,
they are really quite by definition rare occurrences.
Here is my summary of the distribution of occurrences,
using both "before", "after", and "between" as contexts:
***
Arabic Script:
ZWJ before, ZWJ after, ZWNJ+ZWJ between,
(and fitting the pattern, it would be dangerous to prohibit
ZWJ between; these uses were forseen in the basic definition)
Use before or after a cursively linkable character to show the
cursively linkable form in isolation (meta-commentary,
citation of forms for instructional purposes, etc.).
Use between in the combination ZWNJ + ZWJ for
Persian or Mongolian, before certain suffixes which look as if they
are joined to a word stem which does not look as if it is joined
to the suffixes.
Use between if one wishes to block a ligature but keep
a cursive linkage on *both* sides, not merely on the following
side as for the Persian and Mongolian special suffixes.
This was forseen in the original definition of ZWJ,
and is considered more basic than the Persian and Mongolian
uses, hence requires only one code ZWJ not two ZWNJ+ZWJ.
Various writing systems based on Arabic
Script certainly differ in their obligatory and optional ligatures.
This can be handled to some degree by differing fonts,
so a font for Persian language will have different sets of
obligatory, optional, and rare ligatures than will a font for
Arabic language. Etc.
It would be very bad design to abolish the capability of using ZWJ
to block a single ligature yet permit the cursive linking to remain
(consistent with the basic definition on page 6-70).
***
Devanagari:
ZWJ after
ZWJ between
Use of ZWJ after a sequence of Consonant + Virama
to render the linked half-form of the Consonant, as if another
Consonant were following, but when there is none such.
Exactly analogous to the Arabic case.
Uses of ZWJ before a YA might be a method of
indicating the special combining form of YA after another
consonant plus Virama, but without the preceding consonant
or Virama present. Certainly for some Indic alphabets
for which the normal rendering of /ky/ involves a change
in the <y> rather than in the <k>. Generally for Telugu
and Kannada subscript consonants this might be done.
Less obvious for Devanagari, but citation of isolated
subscript forms there too might be done this way.
Exactly analogous to the Arabic case.
Use of ZWJ between, exemplified in section 3. just above
(and illustrated on Unicode 2.0 page 6-37)
to show a half-consonant + following consonant
instead of permitting the two to combine into a conjunct.
Exactly analogous to the case of blocking an Arabic ligature
yet keeping characers linked (cursively in the case of Arabic).
***
Latin
ZWJ after, ZWJ before,
(perhaps ZWJ between in a cursive font).
These are illustrated on page 6-71.
The uses of ZWNJ between, ZWNJ+ZWJ between,
and ZWJ+ZWNJ between all produce different results.
Only, irregularly, the use of mere ZWJ between
does not produce a result different from that of
having nothing between the <f> and the .
Why this irregularity, what on earth caused its introduction?
It certainly makes implementations more complex.
========================================================================
From: ECOLING@aol.com
Sent: Tuesday, January 25, 2000 12:03 PM
Subject: ZWJ contradictions DO REMAIN
Thanks to Ken for giving us the full history of the ZWJ (etc.) wordings.
Most of them I had seen, and remembered correctly.
The additional information does help to interpret the history,
but does not change the existence of the contradiction,
rather it reinforces the fact that the contradiction is there,
especially when Ken says that the interpretation of Unicode 1.1
continues through to Unicode 3.0. Unicode 1.1 (Ken's citations)
seems to have been more fully explicit than other versions 1.0 or 2.0,
as indeed Ken says noting the "consolidation" relative to 1.1.
Retaining the explicitness of 1.1 would have been very helpful.
The contradiction remains, as I will explain here.
The analysis in my original message on this topic REMAINS VALID.
I am sorry if the demands of comprehensiveness made my previous message
necessarily long. This one is focused only on the contradiction
I will suggest in a separate message two ways of handling the
contradiction.
***
First a question as to whether Ken's presentation is totally complete.
When the "fish" example was first introduced in Unicode 1.1,
did it contain the special exception that
[f+ZWJ+i] would or should *not* block a a ligature
(at least of the type which would be automatic by smart fonts,
handled in those sorts of protocols external to the text)?
If so, it would have been in contradiction to the wording quoted by Ken.
Ken also writes in discussion:
"The "fish" example was to show, however, that the mere
presence of either character *could* result in a non-ligation, by
breaking up the sequence expected by "protocols or resources [[i.e. fonts]]
external to the text sequence" for a ligature to be formed."
The version of the "fish" example in Unicode 2.0 does *not* show
that for ZWJ, because [f+ZWJ+i] is shown as forming a ligature
just as if the ZWJ were not present.
So I have to differ here, at least in the version of the "fish" example
we have in Unicode 2.0 (though perhaps not for the earlier version,
which Ken did not quote in full).
***
Now on to the explicit statements and the contradiction.
The relevant portion of 1.1 I here repeat from Ken's posting:
"The intent of these characters is to address cursive graphical connections
between the glyphs of a script, e.g. in scripts like Arabic whose printed
form emulated handwriting. ZWNJ and ZWJ are best though of as behaving
like tiny letters that neighboring glyphs may connect to (ZWJ) or avoid
connecting to (ZWNJ). They are thus processed as ordinary cursive letters
rather than as control characters.
"ZWNJ and ZWJ affect how the two neighboring glyphs connect to *them*, not
to *each other*. As such, they have no direct relationship with ligature
formation; in particular, ZWJ does not in any way request that its two
neighbors be ligatures to each other. Indeed, both ZWNJ and ZWJ may break up
ligatures by interrupting the character sequence required to form the
ligature."
So Unicode 1.1 states that ZWJ may break up a ligature,
and the special exception which is at least now in the "fish" example
clearly says that ZWJ does not break up a ligature, thus behaving
UNLIKE mere tiny letters that neighboring glyphs
may connect to (ZWJ) or not (ZWNJ).
In the absence of some overriding protocols external to the text,
any letter intervening would do exactly that, as stated in Unicode 1.1.
And it could not be more explicit in the following, where the idea that
[f+ZWJ+i] would or should form a ligature is explicitly rejected
(barring some resources external to the text sequence):
"f + ZWJ + i will not form the ligautre fi. Instead, if cursive versions
of the f and i are available in the font, each will independently connect
to the ZWJ on the appropriate side (having the same appearance as f + i).
[LA: This is the crux of the evidence for the contradiction.]
"Usage of optional ligatures such as fi is not currently controlled by
any codes within the Unicode standard, but is determined by protocols or
resources external to the text sequence."
Ken states the following:
>The semantics of ZWNJ and ZWJ has subsequently been inherited
>without change from Unicode 1.1, through 2.0, and 3.0. Unicode 2.0
>consolidated the text and intent from Unicode 1.1. Nothing that happened
>in Unicode 3.0 has touched that intent in any way.
Therefore I would presume that the demonstration of the contradiction
above still holds valid, and do not need to pursue it here unless further
wrinkles are introduced by others.
***
I am clearly not using terminology sufficiently precise for some readers,
so hope we can get past this:
Ken replies to my statement
>> In a cursively linkable Latin font, however, it could be used, consistent
>> with the wording above, as a means of blocking the ligature while still
>> permitting the cursive linking
>> (under the basic default that it is merely a linkable "neighbor"
character).
as follows:
>No, it could not be used "as a means of blocking the ligature..."
>A ZWJ in such a context might, on the other hand,
>have the (unintended) side-effect of blocking a ligation.
In that case, it could be used to block a ligation.
I am not sure of the difference between "ligation" and "ligature",
except perhaps this difference might refer to other protocols outside
of the text stream.
So how about if I rephrase it (consistent with the clear text of Unicode 1.1
whose interpretation Ken says still holds):
"The use of a ZWJ, consistent with its interpretation as simply a tiny letter
to which neighbors can cursively join, normally *will* have the effect of
interrupting the sequence of characters which might, without ZWJ,
be rendered as a ligature by protocols outside of the text stream,
if such protocols are in use."
Ken's statement on Arabic misunderstands me:
>No. The actual effect depends on the implementation. The ZWJ is not
>*intended* to interrupt ligating sequences, but processes that are
>unaware of this may do the wrong thing. As you pointed out below, for
>the purposes of ligation, a ZWJ in the midst of an Arabic sequence,
>for example, should be handled effectively like an Arabic voweling --
>it should not disrupt the choices of the basic consonant outline
>(ligated or not) from the font.
That is most definitely not my intent, and I think not what I said at all.
Sorry if my wording was too *short* :-) and therefore not explicit
enough.
What I said was that a ZWL (*not* a ZWJ) could, by acting like
an Arabic voweling, have no effect on ligaturing (via external protocols
or whatever precision needs to be added here) without invoking
any kinds of character properties not already known.
A ZWJ (*not a ZWL, now, just the reverse), by acting as a tiny letter
to which other letters can join, is *not* acting like an Arabic voweling
when used to show connecting forms in isolation, it is rather acting
just like any other Arabic base letter. (If ZWJ were acting like an
Arabic voweling, then it would not cause the adjacent letters to take
on a connecting form, and we would have a contradiction with the
explicit statements in the section on Arabic.)
A propos of the earlier discussion with Mark, I wrote:
>> Mark was thus incorrect in stating that this older interpretation had
>> been rejected. It is still the basic wording, placed first, and must
govern
>> other interpretations until changed.
Ken replied:
>The "older interpretation" that Mark was stating had been rejected was
>that of Unicode 1.0, in which the ZWJ could be conceived of as a
join-requester,
>including a request of a ligature.
Perhaps Mark can speak for himself on this,
but in context, I thought Mark was rejecting my insistence
that the original interpretation
(by which I meant the "tiny letters, not controls" interpretation,
intended in 1.0 but only fully clarified in 1.1) was still valid.
According to Ken's comments, it is still valid.
>Under the new interpretation introduced (by Mark, primarily) in Unicode
>1.1, I do not see a contradiction.
Since my demonstration of the contradiction above depends on the wording
of Unicode 1.1, according to Ken still valid for Unicode 2.0 and 3.0
in this respect, I ask that people deal with the contradiction.
I attempt to suggest two ways of doing so in a separate message.
***
Lloyd Anderson
Ecological Linguistics
***************************************************
Some other matters concern wordings and understandings,
which are not as central to establishment of the contradiction
I am pointing to, but which may clarify previous or future discussions,
for those interested.
First, concerning Devanagari, where I do not yet have an explanation
of what Mark Davis considers inconsistent in the handling of ZWJ.
Second, other matters on history of interpretation.
Ken expresses his agreement with the following
(but believes I have misunderstood Mark about inconsistency
in the use of ZWJ in Devanagari):
[LA]
>>There is currently nothing special
>> about the use of ZWJ for Devanagari. It has the basic
>> interpretation of a linkable neighbor character for which
>> conjunct combinations are not defined by fonts, exactly as
>> in the basic wording of page 6-70.
to which Ken responds:
>This latter statement I agree with. In Devanagari, the use of
>the ZWJ creates the context for the explicit half-form, which
>is a "right-linking" form of the consonant. It is then the
>presence of the right-linking form of the consonant that blocks
>the (otherwise automatic) conjunct formation (if the font
>supports it). In this sense, the use of a ZWJ in Devanagari
>can have the indirect effect of breaking a conjunct (i.e. ligature),
>and such usage is intentional in Devanagari. But the breaking
>of the conjunct is secondary -- and not the direct implication
>of a ZWJ requesting a ligature blocking.
Exactly the same wording would I suppose apply to Arabic,
because the sequence [q + ZWJ + l]
would cause [q] to take its link-to-following form,
and [l] to take its link-to-preceding form,
and the presence of these forms would have the indirect effect
of breaking a ligature in Arabic.
Of course I agree that in all cases the effects on ligatures are
secondary, because ZWJ is merely a "tiny letter".
That has been my point the entire time.
If Ken believes there is no inconsistency (I believe Mark's word)
in the usage of ZWJ in Devanagari, at least not in the respect
we have discussed most recently, then I would very much
appreciate explanation of what inconsistency in Devanagari
Mark was referring to. Is there something else we need to be
looking at to ensure consistency? Could we please have this
discussion publicly and not only at the UTC meeting?
I also do not understand this from Ken's message:
> (b) The other aspect of Mark's statement quoted above is this:
>
> "just as current fonts that don't fully support ZWJ
> cause it to break ligatures."
>
> This is incorrect. Current fonts which fully support ZWJ
> *do* cause it to break ligatures, that is in fact a usage made
> entirely explicit on Unicode 2.0 page 6-37 for Devanagari:
>
>This is entirely different from Mark's intent in this statement.
Since the statement above *is a quote* from Mark,
I would very much appreciate knowing what Mark's intent
was in the statement quoted. I simply took it at what I assumed
was face value. Since there was other discussion which seemed
to reinforce it, I saw no reason not to do so.
Ken's agreement that
>Any number of other
>invisible format controls -- if not properly ignored by a rendering process
--
>could have the same unintended and user-inexplicable effect.
Seems to be agreeing with my position that this is not something
special about ZWJ nor about a proposed ZWL.
***
My wording was clearly insufficiently precise in one point.
When I wrote concerning the history of ZWJ that
>people tended to interpret the ZWJ as a ligature request.
I was referring to the interpretation which was explicitly
rejected in Unicode 1.1. I believed at the time that such
an interpretation had crept into the *text* of Unicode 1.0
partially by accident, and that the original (Becker's) interpretation had
always been there.
I stand corrected if in fact some people (other than Becker)
originally intended that ZWJ could "request" a ligature,
but my understanding of what some other people (including
Becker) originally intended, enough to cause the correction
to the wording in 1.1, was correct.
Ken wrote:
>No. This was explicitly allowed as part of the semantics of ZWJ in
>Unicode 1.0. It was explicitly defined out of the semantics of ZWJ
>in Unicode 1.1. Read the text.
***
Second, concerning Devanagari, Ken's clarification is very helpful.
>When the Devanagari section
>was edited and rewritten for consistency and incorporation into
>Unicode 2.0, it was noted that the text in Unicode 1.0, Volume 2
>regarding ZWJ used this way was inconsistent with the new, restricted
>interpretation; ZWJ could not be used to cause a "virama [to] be
>absorbed into the half consonant form". Instead, the more rigorous
>model of the C + virama --> Cd and Cd + ZWJ --> Ch for Devanagari was
>introduced.
In fact, I was one of the primary, and perhaps the first,
advocate of coding in the order
C + Virama + ZWJ + C
rather than
C + ZWJ + Virama + C.
So I am aware of most of this history.
========================================================================
From: ECOLING@aol.com
Sent: Tuesday, January 25, 2000 12:04 PM
Subject: ZWL still no new properties
In the context of a general consideration of alternatives
for handling ZWLigator, either by changes to ZWJ semantics
or via a new ZWL character, I pointed out that:
>> If we did introduce a ZWL character, it need have no new or unique
>> character properties at all.
and Ken responded:
>Incorrect. The most important issue is precisely that it *does* introduce
>a new character property: ligation request. That property is new, because
>no character currently has it. That is the whole reason for requesting
>the encoding of such a character in the first place.
I completely fail to understand the point here.
"Ligation request" is not a character property,
the ZWL is merely a "tiny invisible letter" without
additional special properties.
ZWL might be *used in* triples of the form
[A + ZWL + B]
which (I hope I am choosing words sufficiently specific
to fit everyone's technical cup of tea here)
are used in protocols external to the text as the contexts
for rendering via ligatures.
But that is true of the character <f> also!
It is used in pairs and triples such as [fi] and [ffi]
which protocols external to the text use as the contexts
for rendering via ligatures.
No difference at all.
Still, ZWL just like ZWJ would merely be a tiny invisible
letter (or, more precisely, have properties like the Arabic vowelings).
As I in fact stated immediately:
>> It should be expected to work exactly like an
>> Arabic floating vowel in not interrupting cursive linking or ligatures.
>>
>> Such a ZWL would be distinctively special *only* to the extent that fonts
>> could use it by treating it as a dummy character with no other uses
>> than to be part of triples of the type
>> Hungarian Runes [d + ZWL + d] to be rendered as <dd> ligature.
Ken responds:
>This is how a font might implement the ligatures involving ZWL, but that
>is not the end of the story. The software itself has to be cognizant
>of the property in some way.... Also, the software will need to
>have hierarchies of interaction between global ligation settings and
>local ligation requests (or blockages), so that the appropriate thing(s)
>can be done to ensure that local preferences correctly override global
>settings, and so on.
Of course, true of any approach to global ligation settings and
local ligation requests. That is necessary independent of any *particular*
choice of means to inform external protocols that a ligature should be
used if available. It is simply a consequence of the mere existence of
both global and local cues for ligation. (Whether using ZWJ makes
that easier than adding ZWL, or the reverse, is another question entirely.)
>> What if we use ZWJ to do double duty as ZWL?
>>
>> The thing to watch out for here is overloading a character
>> with non-analogous uses, to the extent that a contradiction might arise.
>
>This concern should be addressed as Mark has -- with an explicit
>listing of all the contrastive possibilities, matched up against
>the expected outcomes. If there are more expected outcomes than
>can reasonably be handled by judicious expanding of the semantics
>of ZWJ, then perhaps an independent ZWL is warranted. If not,
>then not.
Ken should have noted that I *did* provide a listing of contrastive
possibilities, in response to Mark's request for information,
(Mark had not lined those ones up explicitly in his request),
and no doubt did not include all of the ones Mark would include.
There is no need to repeat that listing
which was at the end of my message
"ZWJ contradictions; ZWL".
Readers should go there.
Sincerely,
Lloyd Anderson
Ecological Linguistics
=======================================================================
From: ECOLING@aol.com
Sent: Tuesday, January 25, 2000 12:05 PM
Subject: ? ZWJ "doubling" as ZWLigator
Ken did not today address the implications of the following
example, and since my earlier message was taken as too long
by Mark, I here highlight this single exampel in a separate message.
What if we use ZWJ to do double duty as ZWL?
The thing to watch out for here is overloading a character
with non-analogous uses, to the extent that a contradiction might arise.
A contradiction in this case might take the form of a script
with a cursive rendering, in which ZWJ was needed to block
a ligature yet leave the rendering cursive, and ZWL was needed
to request a ligature in a particular local spelling of one word
which was not part of the default set of ligatures.
The second usage could be triggered only if the font contained
a triple [A + ZWL + B]. Otherwise it would default to the first:
The first usage would be triggered only if the font did not contain
a triple [A + ZWL + B].
If we used the same ZWJ (not ZWL) character for both functions,
then the inputter would have to know what ligatures the font would
contain, and might get opposite results if the contents of the font
were not what the inputter expected.
I believe that is a sufficient reason,
other than care not to risk problems farther down the road by
using characters without keeping to a consistent semantics,
to not take this second route.
Sincerely,
Lloyd Anderson
Ecological Linguistics
========================================================================
From: ECOLING@aol.com
Sent: Tuesday, January 25, 2000 12:04 PM
Subject: Fixing ZWJ contradictions
I *have* read every word in Ken's reply received this morning.
Another message today confirms that the contradictions I pointed
to really are there. It is focused *only* on that matter.
This message concerns only what to do about it.
In what follows, I will attempt to adjust my exact wording to what
others prefer, so that we are talking about substance and not
preferences in wording.
Two obvious choices, either of them possible
(and there may of course be third and fourth general
paths to solution which I have not thought of), are:
1. Accept the irregularity in [f+ZWJ+i] page 6-71,
and compensate for that as necessary,
and add protections to avoid the propagation of further such
irregularities
or
2. Fix the wording which created this irregularity,
bring it back into line with the general interpretations of ZWJ.
I will treat these in reverse order,
because things will be much clearer that way.
*********************************************************
2. Here are the changes needed to restore consistency, to keep a
consistent interpretation for ZWJ as merely a tiny invisible letter
(one linkable to neighboring letters, in contexts where
protocols external to the text handle linking, ligaturing, etc.).
2A.
Taking route 2, the simplest because it removes the only irregularity
(inconsistency) I have been able to detect, we would have the following
statement instead of the one currently on Unicode 2.0 page 6-71
which suggests ZWJ should be disregarded:
"The use of a ZWJ, consistent with its interpretation as simply a tiny letter
to which neighbors can cursively join, will normally have the effect of
interrupting the sequence of characters which might, without ZWJ,
be rendered as a ligature by protocols outside of the text stream,
if such protocols are in use."
2B.
And consistent with that, a further statement should be added
something as follows. I am pleased that Ken agrees something
like this would be useful:
>> "ZWJ should not normally be introduced between characters which
>> form a ligature in fonts which are not cursively linking. ZWJ has
>> no legitimate function there. Its introduction there is a spelling
error
>> and will usually produce exactly the opposite effect from that
intended,
>> by breaking the sequence of characters and preventing their rendering
>> as a ligature"
Ken writes:
>I concur that something similar to this might be usefully added -- although
>I don't think it is resolving a contradiction. Any number of other
>invisible format controls -- if not properly ignored by a rendering process
--
>could have the same unintended and user-inexplicable effect.
I particularly appreciate Ken's pointing out that any number of other
characters (invisible format controls) could have the same unintended and
user-inexplicable effect. I would add more explicitly that presence
of wrong spellings cannot be said itself to "break" any implementation.
Treating ZWJ as an ignorable character would be in contradiction
with the basic statement (Unicode 1.1, continued through to 3.0
according to Ken) that ZWJ acts as a tiny letter. A tiny letter is not an
ignorable character, it is a letter. It is not ignored when it is used
to get connecting forms of Arabic letters which otherwise would
appear as isolated forms (in external protocols).
It should not be introduced where its normal default effects
(with or without protocols external to the text) are not desired.
2C.
The current example at the bottom of Unicode 2.0 page 6-71
should be changed into a non-Latin font, since (as Ken agrees),
ZWJ would not normally have a use in the dominant type of Latin
text.
Readers have a difficult time squaring their assumptions about
the most familiar kinds of Latin text with the assumptions
necessary to make use of ZWJ relevant or even interpretable.
I would suggest Arabic script, but not just any Arabic letters will do.
Those for use in an example for non-Arabic users should be ones
whose forms are most recognizably related in isolating and various
linking contexts. So *not* using Arabic letters "t,d,n,c,j,h,s,n,y"
but using such as "f,q,k,l,<t-underdot>".
If a Latin cursive script is used, it should be one which does contain
ligatures, so the difference between ligatured and merely linking
is meaningful to the reader.
2D.
In addition to using the Arabic script so readers will understand,
or else using a real Latin cursive *and* ligaturing font...
The current example of "fish" at the bottom of Unicode 2.0 page 6-71
would then be corrected to show the behavior of ZWJ not irregular
(having no effect) but having an effect consistent with its effect
in other contexts, allowing linking of neighboring characters to it.
For the total patterns showing that this is an irregular exception,
please see the separate message today on that subject.
Mark Davis suggested use of the following sort of sequence
to get the effect of linking but not ligaturing (now I substitute
the example with Arabic letters)
[l + ZWJ + ZWNJ + ZWJ + m]
would yield cursively linked but not ligatured renderings.
By contrast, if the inconsistency is removed,
we simply use
[l + ZWJ + m] to get the same result.
Mark's solution is unnecessarily elaborate,
is required only by the irregular interpretation of simple ZWJ
that external protocols should do something more than and
different than treating it simply as a "tiny invisible" linkable letter,
the irregular interpretation that should in addition ignore it
except in a fixed *listing* of special cases;
and the irregularity that they should treat the dependency
between [ZWJ + m] differently depending on whether
there is a preceding ZWNJ or another letter.
In the treatment without the irregularity,
the dependency between [ZWJ + m] is treated exactly the
same, whether or not there is a preceding ZWNJ or space
or letter or anything else.
Respecting precisely that ZWJ is merely a "tiny letter" (linkable).
************************************************************
1. If we take the other alternative, and keep the irregular
interpretation of ZWJ in special contexts definable
as an "elswhere" case (when no adjacent ZWNJ, for example), then
1A. Keep the irregular exception, but note it as such so that further
irregular interpretations do not cascade from it.
1B. As 2B, appropriate in any event.
>> "ZWJ should not normally be introduced between characters which
>> form a ligature in fonts which are not cursively linking. ZWJ has
>> no legitimate function there. Its introduction there is a spelling
error
>> and will usually produce exactly the opposite effect from that
intended,
>> by breaking the sequence of characters and preventing their rendering
>> as a ligature"
1C. As for 2C., Change the "fish" example into an appropriate cursive font,
both likable and with ligatures, so the example makes sense.
1D. If keeping the irregularity that
[letter + ZWJ + letter] "should" normally have the ZWJ disregarded,
so ligatures would still be formed by the protocols external to
the text,
then add the example with Mark's sequence
[letter + ZWJ + ZWNJ + ZWJ + letter]
to indicate how one CAN get the effect of linking on both sides
but without ligatures being rendered by the external protocols.
************************************************************
If Devanagari semantics for ZWJ are consistent with the semantics for
ZWJ elsewhere, then alter the statements on Unicode 2.0 p.6-71 to make
that interpretation unambiguous.
If Ken's explicit agreement with the "last statement" in the paragraph quoted
from me just below includes both of these sentences, rather than merely
the last sentence, then it might be helpful to alter the wording in Unicode
2.0
page 6-71 so that it does not leave *open* any interpretation that the use
of ZWJ in Devanagari is irregular.
>>There is currently nothing special
>> about the use of ZWJ for Devanagari. It has the basic
>> interpretation of a linkable neighbor character for which
>> conjunct combinations are not defined by fonts, exactly as
>> in the basic wording of page 6-70.
I pointed out the possibly misinterpretable phrasing:
>> but the wording now on page 6-71 treats this as if it were exceptional:
>>
>> "The function of the ZWJ may also have a particular interpretation
>> in specific scripts. For example, in Indic scripts it provides
>> an
>> invisible neighbor to which a dead consonant may join in order to
>> induce a half-consonant form. ..."
>>
>> This is in fact no different than the Arabic case, and both are
>> completely consistent with the basic wording, Unicode 2.0 p.70.
Ken responded:
>This is incidental. The text of page 6-71 points to a particular,
>script-specific usage. Yes, it is consistent with the generic sense
>of ZWJ, but in the context of Devanagari, the specific rules regarding
>half-consonant formation are invoked. Those *are* script-specific.
My point was that "may also have a particular interpretation"
suggests something more and quite different from
"may also have particular uses", in that it allows and even suggests
"may also have a particular semantics (different from that in specific
other scripts)". An interpretation of the wording which I think we would
all want to avoid.
I do not even see that the uses (yielding a linking form of an adjacent
consonant) are any different in Devanagari. So it would be better to
remove the possibly misleading suggestion, by rewording.
*******************************************************
Sincerely,
Lloyd Anderson
Ecological Linguistics