Re: Another take on the English apostrophe in Unicode

From: Ted Clancy <tclancy_at_mozilla.com>
Date: Wed, 10 Jun 2015 18:51:45 -0400

On 4/Jun/2015 19:01, Leo Broukhis wrote:
>
> Along the same lines, we might need a MODIFIER LETTER HYPHEN, because, for
> example, the work ack-ack isn't decomposable into words, or even
> morphemes,
> "ack" and "ack".
>
I do think that U+2010 (HYPHEN) is miscategorised. I think it should have
General Category = Pc, not Pd. (That is, hyphens are connectors, not
dashes.) That would make it a "word" character.

Or, at the very least, U+2010 should have Word Break = MidNumLet (meaning
it can occur in the middle of numbers or letters). UAX #29 says that U+2010
deliberately does *not* have Word Break = MidNumLet, though an
implementation may treat it as if it did. (UAX #29 doesn't give any reasons
for this decision. I can understand why U+002D (HYPHEN-MINUS) doesn't have
Word Break = MidNumLet, due to its history of being used as a dash or minus
sign, but U+2010 should never be used as a dash or minus sign, so I don't
see the problem.)

But luckily, the miscategorisation of U+2010 hasn't led to any pressing
practical problems, unlike the misuse of U+2019 for the apostrophe.

- Ted
Received on Wed Jun 10 2015 - 17:53:03 CDT

This archive was generated by hypermail 2.2.0 : Wed Jun 10 2015 - 17:53:03 CDT