Re: ASCII and Unicode lifespan

From: Hans Aberg (haberg@math.su.se)
Date: Thu May 19 2005 - 07:30:25 CDT

  • Next message: Philippe Verdy: "Re: ASCII and Unicode lifespan"

    At 14:00 +0200 2005/05/19, Philippe Verdy wrote:
    >From: "Hans Aberg" <haberg@math.su.se>
    >>It seems me obvious that such developments will happen, regardless
    >>what one does at Unicode. The best Unicode can do, in my opinion,
    >>is helping such developments. Then such developments could be done
    >>in new standards within the scope of the Unicode consortium.
    >
    >As long as ISO/IEC 10646-1 will remain a standard, this won't happen
    >in Unicode.
    >And even if there's a successor, I think it will probably correct a
    >"better" representation by providing equivalences with the past
    >ISO/IEC 10646-1 code points, so even Unicode will adapt to the
    >change.

    I am not sure how it will happen. But it will happen, that is for sure.

    >I don't like the idea of "patchwork". If you think this because
    >scripts are encoded separately when some of them could have been
    >unified, or because some scripts were unified when they should not
    >have been, you forget that Unicode and ISO/IEC 10646 are also
    >replying to these arguments by making the necessary desunification
    >(for example Coptic/Greek recently, however characters were not
    >really splitted).

    If one focuses on the property fields, it will matter little on how
    the characters are distributed in terms of character numbers.

    >If you want another encoding model, I can give a few ideas:
    >- reencoding Korean jamos as simple jamos
    >- adding a model for interlinear annotation that effectively works
    >for Chinese and Hebrew
    >- adding a model for vertical and boustrophedon presentations and
    >their effect on mirrored characters and text layout
    >- making a better layered model for canonical/compatibility equivalences
    >- making a better model for clusters/subclusters (forget the
    >"double" diacritics), including at the syllabic level.
    >- adding a layered model for text in general, that structures it
    >into processable units like paragraphs, sentences and words (also
    >needed to paliate the difficulties caused by East-Asian scripts).
    >...

    I pointed out that if on admits stacked structures, by adding "begin"
    and "end" abstract characters, then some now complicated things can
    be made simple. For example, the problem of merging text of vertical
    and horizontal reading directions is mainly a stacking problem: The
    rendering direction is merely chosen based on the rendering
    directions of the scripts at the current level and the one above in
    the stack.

    >Unfortunately, we still have to live with all these limitations.

    The current Unicode set is very laudable in that it provides an
    opportunity to represent all that text in computers. But I doubt that
    anybody will want and accept to live with limitations viewed as
    cumbersome forever.

    >However all this goes too far away from the objectives of ISO/IEC
    >10646 which is to reconciliate the many incompatible charsets that
    >have been developed everywhere. This objective is still the most
    >wanted one today, as conversions of charsets is needed in so many
    >places, and ISO/IEC 10646 plays the role of a "kernel" intermediate
    >representation.
    >
    >For the future, it seems that the definition of "plain-text" is
    >likely to be extended, to cover things that are considered part of
    >"rich text formating" today. The "limitations" today of Unicode and
    >ISO/IEC 10646 are however easily solved by adding those upper layers
    >of processing on top of it, and for this reason, it is very unlikely
    >that there will be a big revolution in ISO/IEC 10646-2 (and the
    >corresponding Unicode successor)...

    Som of the things now considered a part of computer language formats
    should probably moved down to the abstract character set level.

    The principle I see there is that data that in one or way is
    semantically atomic should be in the character set. One example is
    the "begin" and "end" characters mentioned above. Another are
    super-/sub-scripting for use in math.

    -- 
       Hans Aberg
    


    This archive was generated by hypermail 2.1.5 : Thu May 19 2005 - 07:31:31 CDT