Re: Bidi edge cases in Hangul and Indic

From: Ken Whistler via Unicode <unicode_at_unicode.org>
Date: Thu, 22 Feb 2018 15:32:45 -0800

On 2/22/2018 11:39 AM, David Corbett via Unicode wrote:
> For example, after a right-to-left override, the Hangul string 보기
> (“bogi”) becomes 기보 (“gibo”) in visual order. However, its NFD form is
> reordered by jamo instead of by syllable; that is, it looks like “igob”.

Nope. *tilt* The UBA reorders the display order in layout -- not the
underlying string.

"bogi" is the sequence <1107, 1169, 1100, 1175> in NFD or <BCF4, AE30>
in NFC.

Because of canonical equivalence, for display of the NFD string, the
sequence <1107,1169> needs to be mapped onto the same *glyph* as BCF4,
and the sequence <1100,1175> onto the same *glyph* as AE30.

If you override the normal left-to-right ordering with bidi override
controls, then the layout order is reversed, but what is actually laid
out is those two glyphs. So you just reverse the order of the two
syllables for display, in either case.

You could force display of "igob", but only if you had inserted some
character in between the conjoining jamos that was preventing their
equivalence to the syllables, anyway.

> I don’t think it is the intent of the algorithm that canonically
> equivalent strings display so very differently, but I can’t find any
> explicit guidance. What should a UBA-conformant renderer do?

The right thing. ;-)

--Ken
Received on Thu Feb 22 2018 - 17:33:30 CST

This archive was generated by hypermail 2.2.0 : Thu Feb 22 2018 - 17:33:31 CST