From: Peter Kirk (peterkirk@qaya.org)
Date: Sun Nov 28 2004 - 12:27:33 CST
On 28/11/2004 00:21, Mark E. Shoulson wrote:
> ...
> Well, that's the difference under discussion. The "plain text" would
> seem to be either the qere or the ketiv (but not the combined
> "blended" form), since each of those is somewhat sensible. Peter
> Kirk's point is that the blended form is what is in fact written and
> has been so for centuries, so he claims that *it* should be considered
> the plain text.
>
But who says the plain text has to be sensible? Unicode is not concerned
with representing the text as written, not with its meaning. The
following string is meaningless, is not sensible at all, but it is still
plain text: gxyfcwx bfzkgf ikxz bgcuyxukb kbcghjkshxcbnhjkc b bhb
jksdfncfuhikc. (It's not a code, by the way, it comes from random typing.)
Asmus basically agreed with me, but added:
> In scripts with complex layout, of course, not all random character
> soup would be rendered the same by all systems. Which, I think is the
> point here. If this is a rather commonly used device, then in
> principle it's possible to ask why can this not be part of plain text.
>
> If the necessary mechanisms to do this are cheap and simple, the
> answer is often to bring such things under the plain text umbrella. If
> it's complicated, the answer should be to leave it to mechanisms such
> as markup that deal well in (whatever required kind of) complexity.
If there was in fact a need for complex mechanisms to support Ketiv/Qere
blended forms in plain text, then I might agree that alternative markup
mechanisms need to be looked at. But in fact in this case, as I see it,
only two special mechanisms are required:
1) Allowing multiple vowel points with a single base character. The
issues concerning this one were discussed at some length on this list
last year, concerning the form Yerushala(y)im which is the commonest
such form. The solution which was agreed for this form works well with
the other rare forms in this category.
2) Allowing floating vowel points (and sometimes accents) with a blank
base character. This usually, but not always, happens at the beginning
of a word. The mechanism for doing this seems to have been clarified by
the UTC: use NBSP as the base character.
So can't we leave it that these mechanisms can be used for
representation of these forms by those who wish to represent them in
plain text, whereas those who want to use other mechanisms are free to
do so?
In answer to the possible objection that this leaves alternative ways to
represent the same text, I note that the same alternatives already apply
with e.g. superscript digits which may be represented either in plain
text with the Unicode superscript digit characters, or as marked up text
using superscript markup.
-- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/
This archive was generated by hypermail 2.1.5 : Sun Nov 28 2004 - 19:52:57 CST