From: Peter Constable (petercon@microsoft.com)
Date: Mon Nov 29 2004 - 17:20:02 CST
> From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]
On Behalf
> Of Jony Rosenne
> > But it *is* a
> > piece of text, however
> > malformed it might seem from normal lexicographic
> > understanding. It may not be a word. It
> > may, in fact, be two words merged into a unit. But it is most
> > certainly text.
>
> Sure it is text, but it is not plain text.
>
> Qere and Ketiv are not malformed. I don't think anyone disagrees that
they
> are the juxtaposition of the letters of one word with the vowel points
of
> another.
>
> That most cases can be visibly reproduced by Unicode is a hack...
Jony, where you and I have had a different worldview is that, it seems
to me, you view characters as encoding language, and I view characters
as encoding letterforms; or, put another way, for you, text is
necessarily linguistic, whereas for me text is text, independent of
linguistic interpretation. To make this concrete, the fact that a qere
sequence involves the vowel points of word A rather than word B is
linguistically interesting, but irrelevant as far as encoding is
concerned. If the displayed letterforms consist of a lamed with two
vowel points, then the encoded character sequence IMO should be lamed
with two vowel points -- and I would not consider that a hack.
> and is not a
> sufficient justification to extend Unicode to support cases that
cannot be
> reproduced.
>
> There is the case of Yerushala(y)im, for which the plain text hack
would
> require an invisible RTL letter to represent the omitted Yod, or to
allow
> pointing an RLM. The CGJ hack may work too but it is based on a
> misunderstanding, as if the Lamed has two vowels.
The only hackish thing about needing CGJ is that the combining classes
for vowel points that occupy the same space relative to a base should
never have been different from one another, but since we cannot revise
that detail, we need to come up with another mechanism to deal with it.
I agree that using CGJ is a hack, but not because the text involves one
base letterform with two combining vowel points.
> > But I'm now, as always, happy to hear alternate suggestions
> > as to how things might be
> > handled in either encoding or display. So if you think merged
> > Ketiv/Qere forms should be
> > handled by markup, perhaps you can explain how, so that I
> > might better understand. Thank you.
>
> This is the Unicode list, not the markup - SGML etc. list. And I do
not know
> too much about markup.
It's not a list dedicated to discussion of markup, but if people contend
that a solution to a problem lies in something other than plain text,
then it is germane to this list to have that alternative solution
elaborated.
Peter Constable
This archive was generated by hypermail 2.1.5 : Mon Nov 29 2004 - 17:21:31 CST