On Sun, 27 Jan 2019 14:09:31 -0500
James Tauber via Unicode <unicode_at_unicode.org> wrote:
> On Sun, Jan 27, 2019 at 1:22 PM Richard Wordingham via Unicode <
> unicode_at_unicode.org> wrote:
> > However LibreOffice treats "don't" as a single word for U+0027,
> > U+02BC and U+2019, but "dogs'" as a single word only for U+02BC.
> > This complies with TR27. I'm not surprised, as LibreOffice does
> > use or has used ICU.
> This comes back to my original question that started this thread.
Yes. I'm driving home the problem for those who somehow fail to
understand your opening post.
> Here's a concrete example from Smyth's Grammar:
>
> γένοιτ’ ἄν
>
> Double-clicking on the first word should select the U+2019 as well.
> Interestingly on macOS Mojave it does in Pages[1] but not in Notes,
> the Terminal or here in Gmail on Chrome.
>
> To be clear: when I say "should" I mean that that is the expectation
> classicists have and the failure to meet it is why some of them
> insist on using U+02BC.
>
> I'm happy if the answer is "use U+2019 and go get your text
> segmentation implementations fixed"[2] but am looking for
> confirmation of that.
The problem with that approach is that it assumes one can have a
language-sensitive implementation, and that that will suffice.
Smyth’s grammar gives the concrete example, “γένοιτ’ ἄν”. It contains
the word ‘ἄν’.
Should double-clicking the first Greek word in the paragraph above
select it? That's not going to work if the paragraph above is
considered to be in English. And what about double clicking the third
Greek word? What should that select? Or is that paragraph
ungrammatical?
To fix the problem with possessive plural "dogs’" with U+2019 one has
to parse enough of the paragraph to distinguish an apostrophe from a
closing single inverted comma. Moreover, it assumes that end-of-word
apostrophes will not be included in a span bounded by single inverted
commas. I may observe such a rule, but I don't remember being taught
it.
In Unicode 2.0 the apostrophe was U+02BC; it was changed to U+2019 in
Unicode 2.1. The justification I could find given for the change is in
the Unicore thread (members only) starting at
https://www.unicode.org/mail-arch/unicore-ml/y1997-A/0185.html . The
justification recorded there was merely that:
1) Windows and Mac Latin character sets had equivalents of U+0027, to
which the 'letter apostrophe' was mapped, and U+2019, which was used
for single quotes.
2) The 'punctuation apostrophe' was being mapped to the U+2019 by the
'smart quote' apparatus.
3) For consistency, the 'punctuation apostrophe' should therefore be
encoded by U+2019 instead of U+02BC.
This argument didn't persuade everyone even then, and it feels even
weaker now.
Perhaps I just have the problem that I don't see a sharp difference
between the letter apostrophe and the punctuation apostrophe. For
example, when the pronunciation of English "letter" with a glottal stop
as the intervocalic consonant is represented in writing as something
like "le'er", is it a letter apostrophe because it's a glottal stop, or
a punctuation apostrophe because the 'tt' is dropped?
The issue arises in the orthography of Finnish. The genitive singular
of _keko_ 'a pile' is _keon_ - the 'k' is 'dropped' because of
consonant gradation. However, regularly, the genitive singular of
_raaka_ 'raw' is _raa'an_, where the U+0027 I wrote represent an
apostrophe and is pronounced as a glottal stop. Is this a letter
apostrophe or a punctuation apostrophe? The 'k' has been dropped by
the same rule, but because of the vowel pattern it is replaced by a
glottal stop and written with an apostrophe. English Wiktionary
chooses U+2019: the Finnish Wiktionary ducks the issue and uses U+0027.
Richard.
Received on Sun Jan 27 2019 - 17:21:57 CST
This archive was generated by hypermail 2.2.0 : Sun Jan 27 2019 - 17:21:57 CST