Apostrophe, hyphen, and various other puncutation by default continue
a word, but this behavior may be overriden on a per-language basis.
Heuristics or more sophisticated engines may be needed when the
apostrophe is at the end of a word, as in <the peoples' choice>, since
it is ambiguous. The modifier letter apostrophe, on the other hand, is
always treated as a letter.
This is as good a point as any to point people's attention to a new
proposed draft TR, Text Boundaries, at
http://www.unicode.org/reports/tr29/. This is in the initial (proposed
draft) stage, so there is opportunity for feedback on it. Note: the
grapheme cluster update was moved here from U3.2 to allow more time
for feedback and tuning.
Mark
—————
Γνῶθι σαυτόν — Θαλῆς
[For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr]
----- Original Message -----
From: "Marco Cimarosti" <marco.cimarosti@essetre.it>
To: "'Kenneth Whistler'" <kenw@sybase.com>; <Peter_Constable@sil.org>
Cc: <unicode@unicode.org>
Sent: Tuesday, March 26, 2002 02:24
Subject: RE: apostrophe vs. modifier letter apostrophe
> Kenneth Whistler wrote:
> > [...]
> > This is just the computer-age version of the age-old question as
> > to why a linguist would want to distinguish anything that
functions
> > differently.
> >
> > For years back in the late 70's and early 80's, before I got my
> > first PC, I typed up index slips with a manual typewriter. That
> > manual typewriter had various custom keys welded on, so that I
could
> > get schwas, open-o's, lambda's, dead-key commas above, and the
like.
> > [...]
>
> I stop quoting here because I already collected enough instances of
<'s> for
> making my point...
>
> It seems to me that a word such as "lambda's" is just an English
plural noun
> (also spelled "lambdas"), so it should be allowed in identifiers, it
should
> count as a unit for word selections, etc.
>
> Clearly, U+0027 (APOSTROPHE, general category "Po" = other
punctuation) is
> not fit for this purpose, because it has the wrong category and
because it
> is ambiguously used as a quotation mark.
>
> But neither U+2019 (RIGHT SINGLE *QUOTATION* MARK, general category
"Pf" =
> final *punctuation*) seems fit for the purpose.
>
> So, why does the Unicode book suggest U+2019 as the preferred
character for
> apostrophe? Wouldn't U+02BC (MODIFIER LETTER APOSTROPHE, general
category
> "Lm" = modifier letter) be a better choice?
>
> _ Marco
>
>
This archive was generated by hypermail 2.1.2 : Tue Mar 26 2002 - 10:47:11 EST