Re: New translation posted

From: Mark Davis (mark.davis@icu-project.org)
Date: Mon Feb 05 2007 - 11:11:03 CST

  • Next message: Philippe Verdy: "Re: writing Chinese dialects"

    Early on in Unicode we considered having more functional separation: one
    could have, for example, decimal-period, abbreviation-period,
    sentence-period. Yes, if all text were written with these, it would be
    easier to parse. But the odds of data actually arriving in such a form is,
    let's say, exceedingly small. Training the world to pick the right one of
    two visually-identical keys on their keyboards is virtually impossible. So,
    software has to be prepared to accept the "wrong" form anyway; the
    distinction just causes more problems than it solves.

    Mark

    On 2/5/07, Hans Aberg <haberg@math.su.se> wrote:
    >
    > On 4 Feb 2007, at 00:02, Doug Ewell wrote:
    >
    > >> Well, the apostrophe used in language is not semantically a right
    > >> single quotation mark. There might be some subtle rendering
    > >> differences between a U+2019 and a proper, linguistic apostrophe,
    > >> like in spacing.
    > >>
    > >> And if U+0027 is a multipurpose character, then a there is a gap
    > >> in the Unicode character set.
    > >>
    > >> And then: a new character should added.
    > >
    > > The NamesList file [1], which is a formal part of the Unicode
    > > Character Database, says U+2019 is the preferred character for
    > > apostrophe. It has this annotation under three characters: U+0027,
    > > U+02BC, and U+2019 itself.
    > >
    > > Regardless of whether there is a school of thought that
    > > "apostrophe" and "right single quotation mark" should be different
    > > characters, this is what the Unicode Technical Committee has
    > > decided, and while they may change their minds — in Unicode 1.0 the
    > > preferred apostrophe was U+02BC — I would be amazed if they did so.
    > >
    > > I'm sure Ken Whistler will come along soon with a better-
    > > articulated and more authoritative version of this.
    > >
    > > [1] http://www.unicode.org/Public/UNIDATA/NamesList.txt
    >
    > Though Unicode has decided to recommend the right single quotation
    > mark U+2019 to double as punctuation apostrophe, they are
    > semantically different, and even though it may seem clever with such
    > doubling in a more narrow context, when the context widens, some
    > problems may ensue. Now, in the case of this particular character,
    > the problems may very great, but it may still be annoying.
    >
    > For example, parsing text becomes ambiguous, problematic for computer
    > programs. If correct parsing is needed for further processing, there
    > will be annoying failures, and if those should be removed, one will
    > have to set humans together with some computer language extensions,
    > removing those ambiguities by hand, which might hev been eliminated
    > in the first place.
    >
    > Hans Aberg
    >
    >
    >
    >
    >

    -- 
    Mark
    


    This archive was generated by hypermail 2.1.5 : Mon Feb 05 2007 - 11:13:22 CST