RE: Missing character: Combining Up Tack Above

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Mar 30 2007 - 17:29:48 CST

  • Next message: Asmus Freytag: "Re: Missing character: Combining Up Tack Above"

    De la part de Kenneth Whistler
    > You either claim:
    >
    > A. That isn't it (your straw position here), so a separate
    > mark needs to be encoded.
    >
    > or
    >
    > B. That is it.
    >
    > In case A, you end up introducing another problematical confusable
    > issue. By claiming functional distinction for two marks that would
    > be visually virtually indistinguishable, you end up with the same
    > kinds of confusion that occurs anytime visually indistinguishable
    > characters are claimed to be distinct: ordinary users will have
    > trouble determining which to use when, and you will end up with
    > data corruption as a result.
    >
    > In case B, you end up with the possibility (or likelihood) that
    > presentation of marks in combination won't result in the exact
    > shapes expected, and the need to specify rules for glyphic
    > combination in particular contexts.
    >
    > Case A is more difficult to justify paradigmatically. You end
    > up with a mark that looks like X but only occurs in context Y,
    > and another mark that looks like X but only occurs in context Z,
    > when contexts Y and Z don't overlap. In particular, you have
    > a vertical tick that is applied to base vowels (U+030D) and
    > another vertical tick that is applied to macrons (U+XXXX).

    I'm not sure that encoding such entity would create a new confusable. In fact, even in the case whre the source was effectively a decodared macron, the character above the macron is not really a combining tick above, because such tick is normally detached and not linked to the rest of the grapheme cluster.

    The properties of the tick above, or the other previously suggested alternatives are that these combining marks have combining class Above.
    Extract from "DerivedCombiningClass.txt":

    # Canonical_Combining_Class=Above

    0300..0314 ; 230 # Mn [21] COMBINING GRAVE ACCENT..COMBINING REVERSED COMMA ABOVE
    033D..0344 ; 230 # Mn [8] COMBINING X ABOVE..COMBINING GREEK DIALYTIKA TONOS
    0346 ; 230 # Mn COMBINING BRIDGE ABOVE
    034A..034C ; 230 # Mn [3] COMBINING NOT TILDE ABOVE..COMBINING ALMOST EQUAL TO ABOVE
    0350..0352 ; 230 # Mn [3] COMBINING RIGHT ARROWHEAD ABOVE..COMBINING FERMATA
    0357 ; 230 # Mn COMBINING RIGHT HALF RING ABOVE
    035B ; 230 # Mn COMBINING ZIGZAG ABOVE
    0363..036F ; 230 # Mn [13] COMBINING LATIN SMALL LETTER A..COMBINING LATIN SMALL LETTER X
    0483..0486 ; 230 # Mn [4] COMBINING CYRILLIC TITLO..COMBINING CYRILLIC PSILI PNEUMATA

    This means that the normal presentation of the suggested tick above is to not "Attached_Above", so I see no reason why it would attach to or would overlap another macron encoded with it. But if we encode the combining up tack above, it would clearly be a single mark, with its combining class "Above".

    When encoding the text with <letter, combining macron above, combining ??? above>, nothing indicates that this creates a ligature of the two diacritics, even if this is what happens in the sample document. And if encoding the text with a new "combining Up above", then all association of this diacritic with a "decorated" macron is lost (there's no relation between a macron and a tack, and the tick is not the correct character to represent such decoration).

    That's why I said I was not sure about the effective identity of the second diacritic, and why its representation cause a problem, not visually, but at its nature.

    One good point for encoding a new entity is that there's already a similar diacritic encoded for the tack below, and that Unicode already has encoded similar pairs or diacritic in their "above" and "below" combining forms. The good question is then: does the suggested form merits general interest?

    If yes, then its use will be probably equally correct as the other suggested use of a macron and another diacritic, because the book author intention is not demonstrated clearly and the author also allowed the printer to use such character punching in his publication.

    The handwritten manuscript would help seeing if that was the case, but I fear that such source is not available; there may exist some formal description in the text of the book, but this single page says nothing about the diacritics, which are not even named, but only merely shown graphically in a poor way (even the exact phonetic that these diacritics are supposed to represent are not explained clearly with more than 1 or 2 examples per decorated letter, and the various uses of this combination gives no clear hint about the suggested meaning, because it varies too much depending on the base letter).

    My reading of the sampled page gives another hint about how the diacritics were chosen: they were not clearly made with English meaning, but were chosen because they are related to similar phonetic rules of the same diacritics used in other languages (see for example "ä", with diaeresis, which is linked to the German "ä", or "ç" with decilla, which is linked to French or Portuguese "ç"). But is there any hint suggesting that the "tack" is related to another similar phoneme in another language? I see no good relation with the example shown for the "ā" (decoration of letter a by a simple macron).

    So we have several problems with the sample document:
            * at the semantic level: naming and author's intention, which is not explained except that it would represent a phonetic distinction, and that seems to assume that American English and British English are using an uniform standard phonetic; all we can say is that the author was in fact in Boston, USA, and that this best represents the phonetic in Boston (I will leave the other assumption) and that the phonetic should be resolved in that area, if we can know what is the cultural background of this author (notably his other native languages, if other than English).
            * at the text encoding level: one or two diacritics?
            * at the graphic level (because the sample is not clear, notably when the "tack" appears above "i"): is it really a tack, or something else like a macron decorated with a vertical tick, or a macron decorated with a comma or hook above?
            * character properties: if two combining marks are intended, attached or detached?



    This archive was generated by hypermail 2.1.5 : Fri Mar 30 2007 - 17:32:16 CST