Re: (SC2WG2.609) New contribution N2705

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu Feb 19 2004 - 04:47:08 EST

  • Next message: Jon Hanna: "Re: inconsistent behaviour in windows"

    I have the same feeling, notably because the exposed documents are meant to be
    fonted to have its notations readable and consistent.

    And most probably because it creates new irrelevant character distinctions
    within rich-text formats (SGML, HTML, ...) to manage these characters as well as
    other occurences coded with markup in order to produce consistent output (here
    of subscripts).

    So suppose we code some subscripts used by Indo-Europeanist, and not some
    others. How will a rendered document look like if some occurences are coded with
    new separate characters, and oters coded with markup and standard characters ?

    Suppose now that such text is to be generated/converted into plain-text. Some
    occurences will be left unmarked, and some others may be left with the new
    characters. There will be additional difficulties to insert a consistent
    additional notation in the plain-text format to convert both categories of
    subscripts. If this notation is not explained in the text itself, the document
    would become unusable. But even if a conversion system is adopted, there will be
    problems to have it produce consistent results throughout the text for all
    occurences of either separate subscript letters and of standard characters with
    subscript markup.

    I much prefer to keep the encoding conservative, only to handle the case of
    bijective mappings from important legacy (non-Unicode) charsets in which they
    were introduced in the early times where rich-text formats were not easily
    interoperable and plain-text was the only solution.

    Today we have lots of way to create easily interoperable rich-text documents
    (HTML, SGML, XML, DocBook, PDF, RTF, Word docs, ...) without needing such
    pollution of Unicode.

    Also I doubt they were ever used in a legacy interoperable charset encoding.
    Authors will tend to use one of the rich-text formats where subscripts are easy
    to produce from almost all existing characters.

    ----- Original Message -----
    From: "Rick McGowan" <rick@unicode.org>
    To: <unicode@unicode.org>
    Sent: Thursday, February 19, 2004 2:45 AM
    Subject: Re: (SC2WG2.609) New contribution N2705

    > As long as we're on the topic, I have to weigh in on the conservative side
    > in this argument, with Ken Whistler. Use of the existing subscript
    > characters is generally bad practice. Adding more subscripts would be
    > adding to the bad practice, and yield even more different ways to express
    > the same thing (markup versus direct encoding).



    This archive was generated by hypermail 2.1.5 : Thu Feb 19 2004 - 05:40:53 EST