From: Mark Davis (mark.davis@icu-project.org)
Date: Mon Feb 05 2007 - 11:11:03 CST
Early on in Unicode we considered having more functional separation: one
could have, for example, decimal-period, abbreviation-period,
sentence-period. Yes, if all text were written with these, it would be
easier to parse. But the odds of data actually arriving in such a form is,
let's say, exceedingly small. Training the world to pick the right one of
two visually-identical keys on their keyboards is virtually impossible. So,
software has to be prepared to accept the "wrong" form anyway; the
distinction just causes more problems than it solves.
Mark
On 2/5/07, Hans Aberg <haberg@math.su.se> wrote:
>
> On 4 Feb 2007, at 00:02, Doug Ewell wrote:
>
> >> Well, the apostrophe used in language is not semantically a right
> >> single quotation mark. There might be some subtle rendering
> >> differences between a U+2019 and a proper, linguistic apostrophe,
> >> like in spacing.
> >>
> >> And if U+0027 is a multipurpose character, then a there is a gap
> >> in the Unicode character set.
> >>
> >> And then: a new character should added.
> >
> > The NamesList file [1], which is a formal part of the Unicode
> > Character Database, says U+2019 is the preferred character for
> > apostrophe. It has this annotation under three characters: U+0027,
> > U+02BC, and U+2019 itself.
> >
> > Regardless of whether there is a school of thought that
> > "apostrophe" and "right single quotation mark" should be different
> > characters, this is what the Unicode Technical Committee has
> > decided, and while they may change their minds — in Unicode 1.0 the
> > preferred apostrophe was U+02BC — I would be amazed if they did so.
> >
> > I'm sure Ken Whistler will come along soon with a better-
> > articulated and more authoritative version of this.
> >
> > [1] http://www.unicode.org/Public/UNIDATA/NamesList.txt
>
> Though Unicode has decided to recommend the right single quotation
> mark U+2019 to double as punctuation apostrophe, they are
> semantically different, and even though it may seem clever with such
> doubling in a more narrow context, when the context widens, some
> problems may ensue. Now, in the case of this particular character,
> the problems may very great, but it may still be annoying.
>
> For example, parsing text becomes ambiguous, problematic for computer
> programs. If correct parsing is needed for further processing, there
> will be annoying failures, and if those should be removed, one will
> have to set humans together with some computer language extensions,
> removing those ambiguities by hand, which might hev been eliminated
> in the first place.
>
> Hans Aberg
>
>
>
>
>
-- Mark
This archive was generated by hypermail 2.1.5 : Mon Feb 05 2007 - 11:13:22 CST