From: John Hudson (tiro@tiro.com)
Date: Mon Jul 07 2003 - 22:23:42 EDT
At 08:51 07/07/2003, Ted Hopp wrote:
> > > ... Given the small number of attested sequences that would be
> > > adversely affected by normalisation re-ordering, I'm beginning to
> > > favour the idea of encoding these sequences as individual characters.
> > > We'd probably only need three or four, plus a right meteg, to solve
> > > the problem, and rendering would work find with existing font and
> > > layout engine technologies.
> >
> > This sounds like a sensible alternative.
>
>This would make data entry difficult for users. Nobody thinks of these
>character sequences as single characters.
If, as Ken suggested, it is feasible to use CGJ or another control
characters without the user needing to know about it, i.e. as something
inserted in the backing string from input in which only the mark characters
are entered by the user, then it should be feasible, and probably easier,
to hide the use of these precomposed mark combinations.
> Editing would also be an
>"interesting" experience. Could one search for lamed-patah and find it as
>part of lamed-<patah+hiriq>? Or would the proposal be to use these new codes
>only as part of bookend processing around normalization (i.e., automatically
>recognize the sequences and substitute, normalize, and then automatically
>substitute back)?
I suppose the latter is feasible. I am very keen that *any* solution should
be invisible to the user.
>I think we need to keep Peter Constable's point in mind that current usage
>should not define the limits of Unicode functionality. Since the principle
>is that all sequences of character codes are permitted (2.10), it seems
>wrong to supply a fix for only "the small number of attested sequences".
This is a concern, but not an overriding one. Yes, all sequences are
permitted, and some will be reordered during normalisation. We are
currently aware of a small number of attested sequences that definitely
should not be reordered. At this stage, I really don't care whether other,
unattested Hebrew mark sequences are reordered or not, just as I know there
are some sequences that Uniscribe cannot render and some that my fonts
cannot render. That said, it is always a possibility that some new sequence
will be attested in an as yet undiscovered or unpublished manuscript, which
is a legitimate if minor concern.
John Hudson
Tiro Typeworks www.tiro.com
Vancouver, BC tiro@tiro.com
The sight of James Cox from the BBC's World at One,
interviewing Robin Oakley, CNN's man in Europe,
surrounded by a scrum of furiously scribbling print
journalists will stand for some time as the apogee of
media cannibalism.
- Emma Brockes, at the EU summit
This archive was generated by hypermail 2.1.5 : Mon Jul 07 2003 - 22:59:46 EDT