Another take on the English apostrophe in Unicode
billposer2 at gmail.com
Thu Jun 11 13:46:01 CDT 2015
I agree with the recommendation of U+02BC. However, it is in fact rarely
used because most of the people who write these languages or create
supporting infrastructure are unawre of such issues.
A small point: it isn't always the spacing diacritic that is used. In some
languages, e.g. Halkomelem, people use the spacing apostrophe if they have
to but prefer the non-spacing version.
On Thu, Jun 11, 2015 at 11:39 AM, Philippe Verdy <verdy_p at wanadoo.fr> wrote:
> Also used in the Breton trigram c’h (considered as a single letter of the
> Breton alphabet, but actually entered as two letters with a diacritic-like
> apostrophe in the middle (which in this case is still not a letter of the
> alphabet...): the trigram c’h is distinct from the digram ch.
> Breton **also** uses a regular apostrophe for elision.
> In fact what you note for the ejective in native american languages is
> effectively a right-combining diacritic, and still not a letter by itself.
> However, given its position and the fact it is "spacing", this is the
> spacing form of the apostrophe diacritic that should be used, and that form
> is then to choose between:
> * U+00B4 (acute, most often ugly, located too high, and too much
> * U+02B9 (prime, nearly good, but still too high),
> * U+02BC (apostrophe),
> * U+02C8 (vertical high tick, but confusable with the mark of stress in
> IPA before a phonetic syllable), and
> * U+02CA (acute/2nd tone, which for me is not distinct from 00B4, only
> used with sinograms in Mandarin Chinese, with its metrics distinct from
> U+00B4 that match the Latin metrics).
> In my opinion 02BC is the best choice for the diacritic apostrophe.
> The other character for the **elision** apostrophe is a punctuation mark
> U+2019 (just like the full stop punctuation is also used as an abbreviation
> mark). There's no confusion with its alternate role as a right-side single
> quote because U+2019 is used in languages that normally never use the
> single quotes, but chevrons (or other punctuation signs in East-Asian
> But in English where single quote are used for small quotations, there's
> still a problem to represent this elision apostrophe when it does not occur
> between two letters where it also marks a gluing of two morphemes (as in
> "don't" or "Peter's"), but at the begining or end of a word. But elisions
> at end of words is also invalid when this is the final word of a quoted
> sentence. If you really want to cite a single English word terminated by an
> elision apostrophe, the single quotes won't be usable and you'll use
> chevrons like in this ‹demo’› and not single or double quotes which are
> difficult to discriminate.
> 2015-06-11 19:47 GMT+02:00 Bill Poser <billposer2 at gmail.com>:
>> To add a factor that I think hasn't been mentioned, there are languages
>> in which apostrophe is used both as a letter by itself and as part of a
>> complex letter. Most of the native languages of British Columbia write
>> glottalized consonants as C+', e.g. <t'> for an ejective alveolar stop, and
>> many use apostrophe by itself for the glottal stop. (Another common
>> convention, which produces other difficulties, is to use the number <7> for
>> glottal stop.)
>> On Wed, Jun 10, 2015 at 2:10 PM, Ted Clancy <tclancy at mozilla.com> wrote:
>>> On 4/Jun/2015 14:34 PM, Markus Scherer wrote:
>>>> Looks all wrong to me.
>>> Hi, Markus. I'm the guy who wrote the blog post. I'll respond to your
>>> points below.
>>>> You can't use simple regular expressions to find word boundaries.
>>>> That's why we have UAX #29.
>>> And UAX #29 doesn't work for words which begin or end with apostrophes,
>>> whether represented by U+0027 or U+2019. It erroneously thinks there's a
>>> word boundary between the apostrophe and the rest of the word.
>>> But UAX #29 *would* work if the apostrophes were represented by U+02BC,
>>> which is what I'm suggesting.
>>> Confusion between apostrophe and quoting -- blame the scribe who came up
>>>> with the ambiguous use, not the people who gave it a number.
>>> I'm not trying to blame anyone. I'm trying to fix the problem.
>>> I know this problem has a long history.
>>> English is taught as that squiggle being punctuation, not a letter.
>>> I think we need make a distinction between the colloquial usage of the
>>> word "punctuation" and the Unicode general category "punctuation" which has
>>> specific technical implications.
>>> I somewhat wish that Unicode had a separate category for "Things that
>>> look like punctuation but behave like letters", which might clear up this
>>> taxonomic confusion. (I would throw U+02BE (MODIFIER LETTER RIGHT HALF
>>> RING) and U+02BF (MODIFIER LETTER LEFT HALF RING), neither of which are
>>> actually modifiers, into that category too.) But we don't. And the English
>>> apostrophe behaves like a letter, regardless of what your primary school
>>> teacher might have told you, so with the options available in Unicode, it
>>> needs to be classed as a letter.
>>> "don’t" is a contraction of two words, it is not one word.
>>> This is utter nonsense. Should my spell-checker recognise "hasn't" as a
>>> valid word? Or should it consider "hasn't" to be the word "hasn" followed
>>> by the word "t", and then flag both of them as spelling errors?
>>> Is "fo'c'sle" the three separate words "fo", "c", and "sle"?
>>> The idea that words with apostrophes aren't valid words is a regrettable
>>> myth that exists in English, which has repeatedly led to the apostrophe
>>> being an afterthought in computing, leading to situations like this one.
>>> If anything, Unicode might have made a mistake in encoding two of these
>>>> that look identical. How are normal users supposed to find both U+2019
>>>> U+02BC on their keyboards, and how are they supposed to deal with
>>> Yeah, and there are fonts where I can't tell the difference between
>>> capital I and lower-case l. But my spell-checker will underline a word
>>> where I erroneously use an I instead of an l, and I imagine spell-checkers
>>> of the future could underline a word where I erroneously use a closing
>>> quote instead of an apostrophe, or vice versa.
>>> There are other possible solutions too, but I don't want to get into a
>>> discussion about UI design. I'll leave that to UI designers.
>>> - Ted
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode