Another take on the English apostrophe in Unicode
John D. Burger
john at mitre.org
Fri Jun 5 12:29:10 CDT 2015
> On Jun 4, 2015, at 17:34 , Markus Scherer <markus.icu at gmail.com> wrote:
> Looks all wrong to me.
> "don’t" is a contraction of two words, it is not one word.
Yes it is. Is "keyboard" two words? How about "newspaper"?
If "don't" is two words, please tell me what two words make up "won't"? (Hint, neither of them is "will".)
Linguistically, "don't" and friends pass all the diagnostics that indicate they're single words.
- John Burger
> English is taught as that squiggle being punctuation, not a letter. (Unlike, say, the Hawaiʻian ʻOkina.)
> You can't use simple regular expressions to find word boundaries. That's why we have UAX #29.
> Confusion between apostrophe and quoting -- blame the scribe who came up with the ambiguous use, not the people who gave it a number.
> If anything, Unicode might have made a mistake in encoding two of these that look identical. How are normal users supposed to find both U+2019 and U+02BC on their keyboards, and how are they supposed to deal with incorrect usage?
More information about the Unicode