Re: Bidi reordering of soft hyphen

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Tue, 1 Apr 2014 21:10:23 +0100

On Tue, 1 Apr 2014 12:51:11 +0700
James Clark <jjc_at_jclark.com> wrote:

> Suppose I have a paragraph (uppercase = RTL):
>
> CARROT IS car\u00ADrot IN ENGLISH
>
> and the paragraph gets broken at the soft hyphen.
>
> Is the correct ordering for the first line
>
> car- SI TORRAC
>
> or
>
> -car SI TORRAC
>
> ? I did not succeed in deducing the answer from UAX#9. Soft hyphen
> has bidi class BN, which means it gets removed in stage X9, and so,
> if I have understood correctly, doesn't have a defined embedding
> level.
>
> I'm guessing the correct ordering is the first one, but I don't trust
> my instincts here. (In particular, I wondered whether this was
> analogous to the case where rule L1 resets embedding levels so that
> trailing whitespace is at the visual end of the line.)

There is no conformance requirement on the location of the soft
hyphen. Indeed, there is no requirement on whether it is rendered at
all (TUS Section 16.2). As the treatment of the soft-hyphen is
language dependent even in unidirectional text, I am afraid the
treatment is down to good taste and the language(s) involved. (E.g.,
is this Arabic text effectively embedding English text within an overall
Thai context?)

As U+2010 HYPHEN would result in text like 'car-', in an English
influenced context I would also go with 'car-'.

Richard.
_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Tue Apr 01 2014 - 15:12:16 CDT

This archive was generated by hypermail 2.2.0 : Tue Apr 01 2014 - 15:12:16 CDT