Re: Ancient Greek apostrophe marking elision

From: Asmus Freytag via Unicode <unicode_at_unicode.org>
Date: Sat, 26 Jan 2019 17:11:49 -0800
On 1/26/2019 3:02 AM, Mark Davis ☕️ via Unicode wrote:
breaking selection for "d'Artagnan" or "can't" into two is overly fussy.

True, and that is not what U+2019 does; it does not break medially.


Not everyone seems to have got the word . . . but that's not Unicode's fault. But shows that picking specific character codes from among a set that are identical except for (invisible) properties could be a losing game if widely deployed software can't be relied on to honor such finesse.

A./

PS: btw, the Root Zone of the DNS will not support U+02BC as a "letter". The "invisible" distinction in property is irrelevant when it comes to identifies that are identified visually by users, and further, we don't really want to encourage people to use it to register words intended to contain apostrophes. Since we can't have ordinary apostrophes or U+2019, we can't have U+02BC looking like it might be one of the others.

To make matters worse, users for languages that "should" use U+02BC aren't actually consistent; much data uses U+2019 or U+0027. Ordinary users can't tell the difference (and spell checkers seem not successful in enforcing the practice).


Mark


On Fri, Jan 25, 2019 at 11:07 PM Asmus Freytag via Unicode <unicode@unicode.org> wrote:
On 1/25/2019 9:39 AM, James Tauber via Unicode wrote:
Thank you, although the word break does still affect things like double-clicking to select.

And people do seem to want to use U+02BC for this reason (and I'm trying to articulate why that isn't what U+02BC is meant for).

For normal edition operations, breaking selection for "d'Artagnan" or "can't" into two is overly fussy.

No wonder people get frustrated.

A./

James

On Fri, Jan 25, 2019 at 12:34 PM Mark Davis ☕️ <mark@macchiato.com> wrote:
U+2019 is normally the character used, except where the ’ is considered a letter. When it is between letters it doesn't cause a word break, but because it is also a right single quote, at the end of words there is a break. Thus in a phrase like «tryin’ to go» there is a word break after the n, because one can't tell.

So something like "δ’ αρχαια" (picking a phrase at random) would have a word break after the delta. 

Word break: 
δ αρχαια 

However, there is no line break between them (which is the more important operation in normal usage). Probably not worth tailoring the word break.

Line break:
δ’ αρχαια 

Mark


On Fri, Jan 25, 2019 at 1:10 PM James Tauber via Unicode <unicode@unicode.org> wrote:
There seems some debate amongst digital classicists in whether to use U+2019 or U+02BC to represent the apostrophe in Ancient Greek when marking elision. (e.g. δ’ for δέ preceding a word starting with a vowel).

It seems to me that U+2019 is the technically correct choice per the Unicode Standard but it is not without at least one problem: default word breaking rules.

I'm trying to provide guidelines for digital classicists in this regard.

Is it correct to say the following:

1) U+2019 is the correct character to use for the apostrophe in Ancient Greek when marking elision. 
2) U+02BC is a misuse of a modifier for this purpose
3) However, use of U+2019 (unlike U+02BC) means the default Word Boundary Rules in UAX#29 will (incorrectly) exclude the apostrophe from the word token
4) And use of U+02BC (unlike U+2019) means Glyph Cluster Boundary Rules in UAX#29 will (incorrectly) include the apostrophe as part of a glyph cluster with the previous letter
5) The correct solution is to tailor the Word Boundary Rules in the case of Ancient Greek to treat U+2019 as not breaking a word (which shouldn't have the same ambiguity problems with the single quotation mark as in English as it should not be used as a quotation mark in Ancient Greek)

Many thanks in advance.

James


--
James Tauber
Greek Linguistics: https://jktauber.com/
Digital Tolkien: https://digitaltolkien.com/

Twitter: @jtauber



Received on Sat Jan 26 2019 - 19:11:59 CST

This archive was generated by hypermail 2.2.0 : Sat Jan 26 2019 - 19:11:59 CST