Line breaking with space followed by RLM/LRM/ZWJ/ZWNJ, and another TR14 issue

From: Peter Kirk (peterkirk@qaya.org)
Date: Mon Nov 10 2003 - 10:40:49 EST

  • Next message: Alexander Savenkov: "Re[2]: Berber/Tifinagh (was: Swahili & Banthu)"

    Some issues with TR14:

    1) The version linked to from
    http://www.unicode.org/versions/Unicode4.0.0/ is an old version,
    http://www.unicode.org/reports/tr14/tr14-13.html.

    2) I note from the latest version of TR14
    (http://www.unicode.org/reports/tr14/) and the line breaking data
    (http://www.unicode.org/Public/UNIDATA/LineBreak.txt) that the
    characters 200C-200F, RLM/LRM/ZWJ/ZWNJ, have line breaking class CM.
    This has the rather peculiar consequence that a space followed by any of
    these characters is treated as ID, and so there is no line break
    opportunity at the beginning of the word. This might be desirable with
    ZWJ if taken as requesting some kind of ligature of the space and the
    following character. But it seems highly undesirable with RLM and LRM.
    These may be used at the beginning of a "word" made up of characters of
    undefined directionality, to ensure that it is rendered with the
    intended directionality even when separated from its context. But the
    line break before such "words" should not be prohibited.

    -- 
    Peter Kirk
    peter@qaya.org (personal)
    peterkirk@qaya.org (work)
    http://www.qaya.org/
    


    This archive was generated by hypermail 2.1.5 : Mon Nov 10 2003 - 11:28:38 EST