Re: Proposal for BiDi in terminal emulators from Asmus Freytag via Unicode on 2019-01-30 (Unicode Mail List Archive)

From: Asmus Freytag via Unicode <unicode_at_unicode.org>
Date: Wed, 30 Jan 2019 14:37:05 -0800

Arabic terminals and terminal emulators existed at the time of Unicode 1.0. If you are trying to emulate those services, for example so that older software can run, you would need to look at how these programs expected to be fed their data.

I see little reason to reinvent things here, because we are talking about emulating legacy hardware. Or are we not?

It's conceivable, that with modern fonts, one can show some characters that could not be supported on the actual legacy hardware, because that was limited by available character memory and available pre-Unicode character sets. As long as the new characters otherwise fit the paradigm (character per cell) they can be supported without other changes in the protocol beyond change in character set.

However, I would not expect an emulator to accept data in NFD for example.

A./

On 1/30/2019 2:02 PM, Richard Wordingham via Unicode wrote:

On Wed, 30 Jan 2019 15:33:38 +0100
Frédéric Grosshans via Unicode <unicode@unicode.org> wrote:

Le 30/01/2019 à 14:36, Egmont Koblinger via Unicode a écrit :

- It doesn't do Arabic shaping. In my recommendation I'm arguing
that in this mode, where shuffling the characters is the task of
the text editor and not the terminal, so should it be for Arabic
shaping using presentation form characters.

I guess Arabic shaping is doable through presentation form
characters, because the latter are character inherited from legacy
standards using them in such solutions.

So long as you don't care about local variants, e.g. U+0763 ARABIC
LETTER KEHEH WITH THREE DOTS ABOVE.  It has no presentation form
characters.

Basic Arabic shaping, at the level of a typewriter, is straightforward
enough to leave to a terminal emulator, as Eli has suggested.  Lam-alif
would be trickier - one cell or two?

But if you want to support
other “arabic like” scripts (like Syriac, N’ko), or even some LTR
complex scripts, like Myanmar or Khmer, this “solution” cannot work,
because no equivalent of “presentation form characters” exists for
these scripts

I believe combining marks present issues even in implicit modes.  In
implicit mode, one cannot simply delegate the task to normal text
rendering, for one has to allocate text to cells.  There are a number
of complications that spring to mind:

1) Some characters decompose to two characters that may otherwise lay
claim to their own cells:

U+06D3 ARABIC LETTER YEH BARREE WITH HAMZA ABOVE decomposes to <06D2,
0654>.  Do you intend that your scheme be usable by Unicode-compliant
processes?

2) 2-part vowels, such as U+0D4A MALAYALAM VOWEL SIGN O, which
canonically decomposes into a preceding combining mark U+0D46 MALAYALAM
VOWEL SIGN E and following combining mark U+0D3E MALAYALAM VOWEL SIGN
AA.

3) Similar 2-part vowels that do not decompose, such as U+17C4 KHMER
VOWEL SIGN OO.  OpenType layout decomposes that into a preceding
'U+17C1 KHMER VOWEL SIGN E' and the second part.

4) Indic conjuncts.
(i) There are some conjuncts, such as Devanagari K.SSA, where a
display as <KA, VIRAMA>, <SSA> is simply unacceptable.  In some
closely related scripts, this conjunct has the status of a character.

(ii) In some scripts, e.g. Khmer, the virama-equivalent is not an
acceptable alternative to form a consonant stack.  Khmer could
equally well have been encoded with a set of subscript consonants in
the same manner as Tibetan.

(iii) In some scripts, there are marks named as medial consonants
which function in exactly the same way as <'virama', consonant>; it is
silly to render them in entirely different manners.

5) Some non-spacing marks are spacing marks in some contexts.  U+102F
MYANMAR VOWEL SIGN U is probably the best known example.

Richard.

Received on Wed Jan 30 2019 - 16:37:22 CST

This archive was generated by hypermail 2.2.0 : Wed Jan 30 2019 - 16:37:23 CST