Defined Private Use was: SSP default ignorable characters

From: Ernest Cline (ernestcline@mindspring.com)
Date: Tue Apr 27 2004 - 10:27:58 EDT

  • Next message: Doug Ewell: "Re: Defined Private Use was: SSP default ignorable characters"

    From: Doug Ewell <dewell@adelphia.net>
    >
    > In theory, I would think one should be able to do whatever the heck one
    > wants with PUA code points, so long as it involves "characters" in the
    > Unicode sense of the term, used in some way for representing text.
    >
    > What major software providers can be expected to support is another
    > matter entirely. Very little commercial software supports most of the
    > Unicode character properties at all, other than relatively basic
    > concepts like uppercase and lowercase. Show me a general-purpose,
    > commercial software product that understands numeric properties and
    > interprets them, or groups characters together that have the same
    > General Category, or uses canonical combining classes flexibly to
    > perform normalization. If there are any, there aren't many, and if we
    > can't get that kind of support for assigned characters, it's not likely
    > we'll see it for the PUA.

    First off, the General Category serves mainly to establish the default
    values of other properties that are used by general-purpose
    commercial software, so while it is not directly used it is indirectly
    used.

    Secondly, given that Unicode doesn't provide numeric properties
    for letters that are also used as numbers, such as gamma or
    zayin, any system that used only the Unicode numeric properties
    to establish numeric values would be seriously flawed. All those
    properties do, despite their normative status, is provide informational
    guidance that is insufficient (except in the case of decimal digits)
    to design any generic number system from.

    Other properties, such as Cursive Joining also serve more as a
    guide to minimally accepted behavior. A good cursive Latin font
    should use Cursive Joining to better simulate handwriting in print.
    This is despite the fact that Unicode only specifies Cursive Joining
    for Arabic and Syriac.

    For different reasons, the usefulness of establishing Private Use
    characters with those properties is not important, in and of themselves.

    Others, such as Line Break, Bidi Class, and Casing are important,
    are used by existing software, and unlike Cursive Joining cannot
    simply be handled at present by putting out a Private Use font which
    is the current way that Private Use characters can be most easily
    and portably implemented. It is for support of these properties,
    that having a more precisely defined set of Private Use characters
    would be of use. Other properties, such as General Category
    which are not directly used, would only need to be set because
    of the values of other properties that are attached to them, and are
    of direct use.



    This archive was generated by hypermail 2.1.5 : Tue Apr 27 2004 - 11:20:51 EDT