Another take on the English apostrophe in Unicode
charupdate at orange.fr
Mon Jun 15 08:19:26 CDT 2015
On Tue Mar 26 2002 - 10:01:43 EST, Mark Davis ☕️ wrote:
> Apostrophe, hyphen, and various other puncutation by default continue
> a word, but this behavior may be overriden on a per-language basis.
> Heuristics or more sophisticated engines may be needed when the
> apostrophe is at the end of a word, as in “the peoples' choice”, since
> it is ambiguous. The modifier letter apostrophe, on the other hand, is
> always treated as a letter.
[I replaced '<' '>' with '“' '”' to prevent confusion with a tag by the user agent.]
On Tue Mar 26 2002 - 11:44:28 EST, Marco Cimarosti wrote:
> Mark Davis wrote:
>> Apostrophe, hyphen, and various other puncutation by default continue
>> a word, but this behavior may be overriden on a per-language basis.
> This may work for things such as finding word boundaries, but not for
> According to the ID_Start and ID_Continue properties in
> , neither
> U+0027 (APOSTROPHE) nor U+2019 (RIGHT SINGLE QUOTATION MARK) are allowed in
> an identifier. And this is not surprising, since they are primarily
> quotation marks.
> On the other hand, U+02BC (MODIFIER LETTER APOSTROPHE) is allowed in any
> position within an identifier. Using U+02BC as the apostrophe, would allow
> to use words such as: , or <'em> in identifiers.
> But this hits against the fact that Unicode's own suggestion is to use
> U+2019 for the apostrophe.
On Tue Mar 26 2002 - 12:08:41 EST , Marco Cimarosti wrote:
> But, as you say, the apostrophe is legitimate and sometimes mandatory in the
> orthography of English and many other languages. So, it seems to me that its
> preferred encoding should make it possible to use it in identifiers,
> filenames, URI(')s, and so on.
Don't we fall back into the times of all-0x27 and stay in front of on-going confusion when
English apostrophe is ambiguated with closing-quote?
As you told us, having both U+02BC and U+2019 in use will need some supplemental algorithms.
But as you told in 2002, this is true when both are confused in only one character, too.
I suspect that the cost of using MODIFIER LETTER APOSTROPHE for English apostrophe (and as
apostrophe on the whole) today would mainly be the cost of updating implementations and text files.
If this cost is too high, we would have to consider that text has not to be quoted nor to be converted
between British and US English. I hope people will stay communicating and exchanging.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode