Re: Saudi-Arabian Copyright sign

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Tue Sep 21 2004 - 12:02:01 CDT

  • Next message: Asmus Freytag: "Re: Named sequences, was: Saudi-Arabian Copyright sign"

    At 10:55 PM 9/20/2004, Doug Ewell wrote:
    >Jörg Knappen <knappen at uni dash mainz dot de> wrote:
    >
    > > I see a precedent in Unicode to treat Copyright-like sign differently
    > > from simple encircled letters:
    > >
    > > Unicode takes precautions not to encode the same character twice.
    > > Therefore, superscript digits 2 and 3 are absent from the superscript
    > > block U+2070 ff.
    > >
    > > However, the Block eclosed alphanumerics U+2460 ff includes encircled
    > > capital latin letters C, P, and R in addition to the copyright-like
    > > sing elsewhere.
    >
    >OK, I guess I need some guidance from the Unicode elder statesmen and
    >greater experts.

    Ken and I have both given quite a bit of guidance on this issue.
    It would have been nice if you had acknowledged the pertinent aspects
    of our postings - so that we don't need to repeat them.

    >I have been under the impression all along that what Jörg calls
    >"copyright-like signs," meaning U+00A9 and U+00AE and U+2117 and
    >possibly others, are encoded are separate entities primarily because
    >they were in pre-existing legacy character sets. Remember that a major
    >goal of Unicode at its inception was to make sure all such character
    >sets were covered.
    >
    >Obviously U+00A9 and U+00AE were in ISO 8859-1, at those same code
    >points. They also appeared in MS-DOS code page 850, which also predated
    >Unicode. I don't know if U+2117 was in any existing standards; I just
    >know it's in my Unicode 1.0 book.

    There's a difference between immediate rationale and long term policy.
    The initial collection (except for 2117) was indeed dictated by the
    desire for compatibility. 2117 was added explicitly as an analogue -
    Although there may be a citation for it in some character set, I don't
    recall that such a citation was used as the primary argument at the time.

    >Jörg's comments imply that these symbols are in Unicode because of a
    >policy or "precedent" for treating such symbols specially, not (or not
    >only) because of the policy of encoding whatever was in the legacy
    >character sets of the time.

    Ken gave a very good answer on why sometimes a combination can't be used
    for a symbol. I won't repeat his arguments here. I just want to note
    that we are still relatively unconstrained in setting the long-term policy
    here. There's tension between those that see the circle used generically,
    and would like to see combining sequences as the standard solution, and
    those that see the combination take on different properties so that in
    the end, separate encoding is warranted (for at least a subset of such
    symbols).

    >Let's suppose we were back in the mid-'90s, and for whatever reason, the
    >circled Latin letters in the U+24xx block were already encoded but the
    >three "copyright-like signs" were not. Suppose they weren't in any
    >legacy character sets either. (Use your imagination.)
    >
    >Now suppose someone proposed that the circled-C copyright symbol
    >(picking the most widely used example) be encoded as a separate entity.
    >Suppose further that someone else pointed out that it could be
    >represented by one of the circled Latin letters in the U+24xx zone (â’¸ or
    >â“’), and a debate ensued over whether those letters were of the correct
    >size.

    Sizing of symbols is very tricky. There are clearly instances where size
    is directly linked to semantics (see N-ary orperators for example). There
    are other instances, such as (R) where the size and position of the symbol
    varies enormously in text. Then there are symbols where the absolute size
    in final form is proscribed by context. The ESTIMATED symbol must be in a
    specific size depending on the value of the *weight* associated with it.

    At some point along this trajectory you leave plain text behind, whether
    you want to or not.

    >Finally, let's suppose that someone else suggested using the combination
    >U+0043 (or U+0063) plus U+20DD, the combining enclosing circle, and that
    >we then had a debate over whether fonts and rendering engines were up to
    >the task.
    >
    >What would UTC and WG2 do? Would they choose to encode COPYRIGHT SIGN
    >on its own, recommend the existing circled Latin letters, or recommend
    >the combining sequence? Why? (Use a separate sheet of paper if
    >necessary.)

    In the scenario you describe, you can assume that users would have used
    the existing code at 24B8 since that's what would have been immediately
    available. Given the existence of that kind of legacy, it's much less
    like that any alternative would garner official support. Legacy, if wide-
    spread, is a strong argument.

    However - it does not apply to our case, because the combining sequence
    is not currently (and may never be) a good solution.

    >On Tue, 21 Sep 2004 14:03:04 "Anto'nio Martins-Tuva'lkin" wrote:
    >
    >The Standard notes for U+2139 that it is «intended for use with 20DD».

    I regard this as one of the 'still-born' parts of Unicode. I think
    there are complexities in the use of enclosing marks that people were
    not realizing. I see a certain amount of caution in the UTC nowadays
    when it comes to using 20DD.

    Note 26A0, which is *not* <0021, 20E4>

    A./



    This archive was generated by hypermail 2.1.5 : Tue Sep 21 2004 - 12:03:03 CDT