Re: Another take on the English apostrophe in Unicode

From: Bill Poser <billposer2_at_gmail.com>
Date: Thu, 11 Jun 2015 10:47:39 -0700

To add a factor that I think hasn't been mentioned, there are languages in
which apostrophe is used both as a letter by itself and as part of a
complex letter. Most of the native languages of British Columbia write
glottalized consonants as C+', e.g. <t'> for an ejective alveolar stop, and
many use apostrophe by itself for the glottal stop. (Another common
convention, which produces other difficulties, is to use the number <7> for
glottal stop.)

Bill

On Wed, Jun 10, 2015 at 2:10 PM, Ted Clancy <tclancy_at_mozilla.com> wrote:

> On 4/Jun/2015 14:34 PM, Markus Scherer wrote:
>>
>> Looks all wrong to me.
>>
> Hi, Markus. I'm the guy who wrote the blog post. I'll respond to your
> points below.
>
>
>
>> You can't use simple regular expressions to find word boundaries. That's
>> why we have UAX #29.
>>
>
> And UAX #29 doesn't work for words which begin or end with apostrophes,
> whether represented by U+0027 or U+2019. It erroneously thinks there's a
> word boundary between the apostrophe and the rest of the word.
>
> But UAX #29 *would* work if the apostrophes were represented by U+02BC,
> which is what I'm suggesting.
>
> Confusion between apostrophe and quoting -- blame the scribe who came up
>> with the ambiguous use, not the people who gave it a number.
>>
> I'm not trying to blame anyone. I'm trying to fix the problem.
>
> I know this problem has a long history.
>
> English is taught as that squiggle being punctuation, not a letter.
>>
> I think we need make a distinction between the colloquial usage of the
> word "punctuation" and the Unicode general category "punctuation" which has
> specific technical implications.
>
> I somewhat wish that Unicode had a separate category for "Things that look
> like punctuation but behave like letters", which might clear up this
> taxonomic confusion. (I would throw U+02BE (MODIFIER LETTER RIGHT HALF
> RING) and U+02BF (MODIFIER LETTER LEFT HALF RING), neither of which are
> actually modifiers, into that category too.) But we don't. And the English
> apostrophe behaves like a letter, regardless of what your primary school
> teacher might have told you, so with the options available in Unicode, it
> needs to be classed as a letter.
>
> "don’t" is a contraction of two words, it is not one word.
>>
> This is utter nonsense. Should my spell-checker recognise "hasn't" as a
> valid word? Or should it consider "hasn't" to be the word "hasn" followed
> by the word "t", and then flag both of them as spelling errors?
>
> Is "fo'c'sle" the three separate words "fo", "c", and "sle"?
>
> The idea that words with apostrophes aren't valid words is a regrettable
> myth that exists in English, which has repeatedly led to the apostrophe
> being an afterthought in computing, leading to situations like this one.
>
> If anything, Unicode might have made a mistake in encoding two of these
>> that look identical. How are normal users supposed to find both U+2019
>> and
>> U+02BC on their keyboards, and how are they supposed to deal with
>> incorrect
>> usage?
>>
> Yeah, and there are fonts where I can't tell the difference between
> capital I and lower-case l. But my spell-checker will underline a word
> where I erroneously use an I instead of an l, and I imagine spell-checkers
> of the future could underline a word where I erroneously use a closing
> quote instead of an apostrophe, or vice versa.
>
> There are other possible solutions too, but I don't want to get into a
> discussion about UI design. I'll leave that to UI designers.
>
> - Ted
>
Received on Thu Jun 11 2015 - 12:49:39 CDT

This archive was generated by hypermail 2.2.0 : Thu Jun 11 2015 - 12:49:39 CDT