On Wed, 30 Jan 2019 15:33:38 +0100 Frédéric Grosshans via Unicode <unicode@unicode.org> wrote:Le 30/01/2019 à 14:36, Egmont Koblinger via Unicode a écrit :- It doesn't do Arabic shaping. In my recommendation I'm arguing that in this mode, where shuffling the characters is the task of the text editor and not the terminal, so should it be for Arabic shaping using presentation form characters.I guess Arabic shaping is doable through presentation form characters, because the latter are character inherited from legacy standards using them in such solutions.So long as you don't care about local variants, e.g. U+0763 ARABIC LETTER KEHEH WITH THREE DOTS ABOVE. It has no presentation form characters. Basic Arabic shaping, at the level of a typewriter, is straightforward enough to leave to a terminal emulator, as Eli has suggested. Lam-alif would be trickier - one cell or two?But if you want to support other “arabic like” scripts (like Syriac, N’ko), or even some LTR complex scripts, like Myanmar or Khmer, this “solution” cannot work, because no equivalent of “presentation form characters” exists for these scriptsI believe combining marks present issues even in implicit modes. In implicit mode, one cannot simply delegate the task to normal text rendering, for one has to allocate text to cells. There are a number of complications that spring to mind: 1) Some characters decompose to two characters that may otherwise lay claim to their own cells: U+06D3 ARABIC LETTER YEH BARREE WITH HAMZA ABOVE decomposes to <06D2, 0654>. Do you intend that your scheme be usable by Unicode-compliant processes? 2) 2-part vowels, such as U+0D4A MALAYALAM VOWEL SIGN O, which canonically decomposes into a preceding combining mark U+0D46 MALAYALAM VOWEL SIGN E and following combining mark U+0D3E MALAYALAM VOWEL SIGN AA. 3) Similar 2-part vowels that do not decompose, such as U+17C4 KHMER VOWEL SIGN OO. OpenType layout decomposes that into a preceding 'U+17C1 KHMER VOWEL SIGN E' and the second part. 4) Indic conjuncts. (i) There are some conjuncts, such as Devanagari K.SSA, where a display as <KA, VIRAMA>, <SSA> is simply unacceptable. In some closely related scripts, this conjunct has the status of a character. (ii) In some scripts, e.g. Khmer, the virama-equivalent is not an acceptable alternative to form a consonant stack. Khmer could equally well have been encoded with a set of subscript consonants in the same manner as Tibetan. (iii) In some scripts, there are marks named as medial consonants which function in exactly the same way as <'virama', consonant>; it is silly to render them in entirely different manners. 5) Some non-spacing marks are spacing marks in some contexts. U+102F MYANMAR VOWEL SIGN U is probably the best known example. Richard.
This archive was generated by hypermail 2.2.0 : Wed Jan 30 2019 - 16:37:23 CST