[unicode] Re: Unicode editing (RE: Unicode complaints)

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Wed Mar 21 2001 - 06:10:08 EST


Roozbeh Pournander wrote:
> If you open a file that contains two adjacent runs at the
> same level, will you make them one run when you write the file?

That was the idea. But only in the case when it is *really* an embedding
having the same directionality as the text where it is inserted. Like this:

Visual: she said i need water and expired
Levels: 000000000222222222222000000000000
Logic: she said <LRE>i need water<PDF> and expired

I don't see how such an embedding could be useful, so I would iron level "2"
to the surrounding "0" and, consequently, remove the embedding controls from
the logical string.

But if the level 2 segment is adjacent (at least on one side) to an
odd-level segment, then it is meaningful, as in the example that I did
earlier:

Visual: she said i need water DIAS EH
Levels: 00000000022222222222211111111
Logic: she said <RLE>HE SAID i need water<PDF>

So, there must be a way to maintain it, or you loose the logical order of
the text:

Visual: she said i need water DIAS EH
Levels: 00000000000000000000011111111
Logic: she said i need water<RLE>HE SAID <PDF> (Wrong! This is not what the
author meant!)

> > The adjacent levels 0 and 2 would be against my scheme, but
> no doubt they
> > are necessary. So my first idea was to add a zero-width
> odd-level character
> > (represented by "*" below) between the two adjacent
> even-level characters:
>
> This may be a solution the problem I mentioned. You can keep
> that between those two runs. But again they are one run in
> terms of bidi algorithm. But again why may want one
>
> But only in the buffer, ok? Users don't like invisible characters.

I don't know. The reason for having that virtual zero-width character is
exactly to make it visible to the user, so that she can act on it (change
its embedding level).

If this is to be hidden, then what is it for?

> > Just, when you select text the lowest level in the
> selection is arbitrary
> > (e.g., 27 or 46) I think that this lowest level should be
> adjusted to 0 or
> > 1, and all the other levels adjusted to maintain the same
> difference with
> > the lowest level.
>
> Yes, it should. Good point to note. Even UAX #9 explicitly notes that
> adding two to all levels will make the same reordering.

Which sound as an implicit invitation to normalize embedding levels, as far
as you can do this maintaining the same logical order.

> But I think that a "neutral neutral" is also needed. When you
> are in an
> Arabic keyboard mode, a space is surely a right-to-left
> space. But what
> about a Hebrew one where people use the shifted keyboard for
> Latin? Would
> someone jump in and help? I almost know nothing about Hebrew...

I am not the person who can help you with this: I don't even know Arabic
editing, and I am adjusting my opinions each time I discover some new fact
from you.

However, admitting that the space on Hebrew keyboards is a real neutral, it
is always possible to come up with a reasonable default directionality:

- If the "override next characters" mode is active, there is no question:
all typed characters get the manually selected level, regardless of their
normal directionality;

- Else, If it is typed between two characters with the same level, it gets
the neighbors' level;

- Else, if it is typed just after another directional character (i.e., the
cursor hasn't been moved since the last letter was entered), then it gets
the same directionality as the last character;

- If all else fails, it gets the directionality of the line or paragraph.

I guess that this would catch the correct embedding level in most cases. If
it doesn't, there is not much else that can help, apart hoping that the user
realizes something was wrong and uses the manual bidi commands (the "arrows"
view) to manually fix the levels.

_ Marco



This archive was generated by hypermail 2.1.2 : Fri Jul 06 2001 - 00:17:14 EDT