On Sun, 27 Jan 2019 12:38:39 -0500
"Mark E. Shoulson via Unicode" <unicode_at_unicode.org> wrote:
> On 1/27/19 11:08 AM, Michael Everson via Unicode wrote:
> > It is a letter. In “can’t” the apostrophe isn’t a letter. It’s a
> > mark of elision. I can double-click on the three words in this
> > paragraph which have the apostrophe in them, and they are all
> > whole-word selected.
>
> That doesn't work when I try it: I double-click on the "a" in "can’t"
> and get only the "can" selected.
>
> This does not necessarily prove anything; my software (Thunderbird)
> is arguably doing it wrong.
Except the Uniocde-compliant processes aren't required to follow the
scheme of TR27 Unicode Text Segmentation. However, it is only required
to select the whole word because the U+2019 is followed by a letter.
TR27 prescribes different behaviour for "dogs'" with U+2019 (interpret
as two 'words') and U+02BC (interpret as one word). The GTK-based
email client I'm using has that difference, but also fails with
"don't" unless one uses U+02BC.
However LibreOffice treats "don't" as a single word for U+0027, U+02BC
and U+2019, but "dogs'" as a single word only for U+02BC. This
complies with TR27. I'm not surprised, as LibreOffice does use or has
used ICU.
Richard.
Received on Sun Jan 27 2019 - 12:19:47 CST
This archive was generated by hypermail 2.2.0 : Sun Jan 27 2019 - 12:19:47 CST