Re: String name and Character Name

From: Peter Kirk (peterkirk@qaya.org)
Date: Sat Apr 23 2005 - 08:59:57 CST

  • Next message: Curtis Clark: "Re: Interrobang"

    On 22/04/2005 10:09, Otto Stolz wrote:

    > Hello Peter Kirk,
    >
    > you have written:
    >
    >> I don't know why there is a need for a second "unique and immutable
    >> identifier" in addition to the U+xxxx code point identifier.
    >
    >
    > Have you ever read Section C.6 of TUS
    > <http://www.unicode.org/versions/Unicode4.0.0/appC.pdf>?
    >
    No. Well, I had not before. (Jill, thanks for defending me, but in fact
    I have not been on this list for much longer than you, and I was not a
    silent lurker!) But I am aware of its contents. I note:

    > In the ISO/IEC framework, the unique character name is viewed as the
    > major resource for
    > both character semantics and cross-mapping among standards. In the
    > framework of the
    > Unicode Standard, ...

    but the sentence does not continue by pointing out the pitfalls of using
    these unique character names. Nor does it mention that, to quote Asmus,
    "the intended purpose of the nameslist was deliberately *reduced* to
    providing an unique and immutable identifier".

    Elsewhere you wrote:

    > So much for the "obvious places"
    > where another contributor to this thread ostensibly had looked to
    > no avail.
    >
    >> I really don't understand why this thread is getting warm.
    >
    >
    > Its just because some of the contributors to this thread apparently
    > have not bothered to do this sort of basic (and simple!) research
    > before conceiving (and conveying) their ideas.

    If this is intended as a reference to me, please withdraw it. I made no
    claims about where I had looked for information. I was well aware of the
    contents of Appendix C.6, even though I had not read the specific text.
    But if there is in a place in TUS where it is made clear that "the
    intended purpose of the nameslist [is only] providing an unique and
    immutable identifier", it is neither of the two places which you quote.

    >> But given that there is such a list, its highly restricted intended
    >> purpose should be made more clear.
    >
    >
    > How could that be made clearer than in TUS, section 16.1?
    >
    > Quote from <http://www.unicode.org/versions/Unicode4.0.0/ch16.pdf>:
    >
    >> The character names in the code charts precisely match the normative
    >> character names in
    >> the Unicode Character Database. Character names are unique and
    >> stable. By convention
    >> they are in uppercase. Because character names are stable, mistaken
    >> names will not be
    >> revised, but may be annotated. For example:
    >> 2118 ℘ SCRIPT CAPITAL P
    >> = Weierstrass elliptic function
    >> • actually this has the form of a lowercase calligraphic p,
    >> despite its name
    >
    >
    Otto, I am aware that you are probably not a mother tongue speaker of
    English, but your written English is so good that I would expect you to
    understand that nowhere in the above quotation is there even the
    slightest suggestion that "the intended purpose of the nameslist [is
    only] providing an unique and immutable identifier", and "it does not
    explicitly include the task of supporting users in identifying
    characters". Elsewhere this section does state:

    > the formal character names may differ in unexpected ways from commonly
    > used names

    but fails to draw the obvious conclusion, and the one accepted by the
    UTC that according to Asmus, that formal character names should not be
    considered to have any significance except in that they are unique and
    immutable.

    On further consideration, I have realised that there is no need to call
    for the list of character names to be formally deprecated, because the
    UTC has already effectively done this by their decision as follows:

    > the intended purpose of the nameslist was deliberately *reduced* to
    > providing an unique and immutable identifier, subject to the rules of
    > Annex L in ISO/IEC 10646 insofar as enforced by WG2.

    For, as Dean pointed out, if

    >"... emphasizing that these are really semi-arbitrary character
    >identifiers and not names per se" sounds awfully close to "deprecation"
    >as names.
    >
    then reducing their purpose "to providing an unique and immutable
    identifier" sounds even closer to "deprecation".

    I am not sure what effect the "subject to the rules of Annex L in
    ISO/IEC 10646 insofar as enforced by WG2" part has in practice, but this
    seems to mark the point at which this goes outside the control of the
    UTC and into that of WG2. And so I am not sure whether I need to suggest
    that WG2 also makes changes.

    But there is a problem in that this decision of the UTC has not been put
    into proper effect even within the text of the Unicode standard itself,
    in which there are a huge number of cases of a Unicode character name
    being given semantic significance. For an example taken almost at
    random, I quote the following from section 16.1, p.415:

    > When a case mapping corresponds solely to a difference based on 
    > versus  in the names of the characters, the case mapping is not
    > given in the names list but only in the Unicode Character Database.

    In other words, case mappings depend on character names, in breach of
    the principle that "the intended purpose of the nameslist [is only]
    providing an unique and immutable identifier". I suspect that I could
    find hundreds of breaches of this principle within the text of the
    standard. A good test for the editors would be whether the text remains
    comprehensible if the character name is replaced by a meaningless (but
    unique) string - if not, it is clear that the character name is acting
    as more than "an unique and immutable identifier".

    > As said before: if you feel like suggesting a better wording
    > then submit via <http://www.unicode.org/reporting.html>.
    >
    I will accept this suggestion because this time you are talking about
    changes which can be made. But I can hardly take it up, because I doubt
    if the editing box on that page will accept the almost complete rewrite
    of the text of the standard which would be required to properly
    implement the restricted purpose of the namelist. And, more seriously,
    large scale editing of this kind, to conform to the decisions of the
    UTC, should be the job of the editors of the standard.

    -- 
    Peter Kirk
    peter@qaya.org (personal)
    peterkirk@qaya.org (work)
    http://www.qaya.org/
    -- 
    No virus found in this outgoing message.
    Checked by AVG Anti-Virus.
    Version: 7.0.308 / Virus Database: 266.10.2 - Release Date: 21/04/2005
    


    This archive was generated by hypermail 2.1.5 : Sat Apr 23 2005 - 09:00:20 CST