RE: Arabic letters separated by markup

From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Thu Jun 16 2005 - 07:29:16 CDT

  • Next message: JFC (Jefsey) Morfin: "Hexatridecimal"

    On Thu, 16 Jun 2005, Jony Rosenne wrote:

    > > Inserting mark-up tags between characters which would normally
    > > ligate or shape or re-position breaks the run of text.
    >
    > I think that the high level protocol, such as HTML or CSS or XML, should
    > define that.

    That seems to be the consensus on the matter here. Although the issue is
    off-topic in a way, I think it would help if the Unicode standard, or
    related documents from the Unicode Consortium, would explain the problem
    and the suggested approach. After all, it is also a matter of identifying
    exactly what _is_ the plain text content in a marked-up document (such as
    HTML, XML, RTF, TeX, ...). For example, whether the plain text content of
    <a>foo</a><b>bar</b> is two strings "foo" and "bar" or one string
    "foobar".

    It ultimately needs to be defined when the markup language is defined,
    but this task has largely been neglected. Some push is needed.

    > My reading of HTML and CSS is that inline markup does not break
    > the run.

    That sounds natural, but the relevant specifications do not really say
    that. Moreover, "inline markup" is not a clear cut concept. In HTML,
    <br> is inline, but it should of course break the run. What if the markup
    in <a>f</a><b>i</b> is inline but the <b> element has a large left margin?
    Would it still be acceptable to render the construct as an fi ligature?

    The report "Unicode in XML and other Markup Languages",
    http://www.unicode.org/reports/tr20/
    does not seem to address this issue.

    -- 
    Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
    


    This archive was generated by hypermail 2.1.5 : Thu Jun 16 2005 - 07:35:49 CDT