Re: Exemplifying apostrophes

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon May 19 2008 - 18:42:20 CDT

  • Next message: Jim Allan: "Re: Exemplifying apostrophes"

    > But even the mail from this list seems not to be sent with a
    > Unicode encoding; another mystery.

    Not really a mystery at all. The encodings depend on the various
    and sundry mail clients used by the people sending mail *to*
    the list.

    > > if
    > > you mean normalization in the sense of transforming to a Normalization
    > > Form
    >
    > Sorry, I meant normalizing according to a set of Unicode-inspired
    > orthographic norms.

    While Unicode may inspire people who work on orthographic norms,
    it is important to note that Unicode itself (and the Unicode Consortium)
    is *not* about orthographic norms. The Unicode Standard identifies
    characters, but it does not attempt to tell people which characters
    they *must* use for particular orthographies. That is up to folks
    who are concerned with orthographic norms.

    I realize that the use of apostrophes is a particularly fraught
    question, but this is a result of the peculiar character encoding
    history of the much-overloaded ASCII 0x27 apostrophe, and the
    subsequent history of introduction of directional single quote
    marks into character encodings and then U+02BC MODIFIER LETTER
    APOSTROPHE into Unicode.

    But there is no one "right answer" -- no matter how much people
    might want one -- for which of the alternates should be used
    under all circumstances.

    > I appreciate your caution. On the other hand, not touching it is a decision,
    > too. If different sources represent the same lexeme with different apostrophes
    > and we refrain from touching them, then we’re asserting
    > (in our project) that
    > these lexemes are distinct, and this interferes with our discovery of
    > translation paths through the lexeme.

    In which case you should probably be doing some version of
    "apostrophe folding" for the purposes of your lexemic analysis.

    > Apparently, though it was by some (e.g., James Kass, who argued--against the
    > view of Asmus Freytag--that Web pages are more, not less, subject to an
    > expectation of standard conformity than are paper-printed works, and finished
    > with: “Web pages on the Unicode site should be exemplary”). For my purposes,
    > it would certainly help if they were exemplary, and it casts doubt on the
    > claim of practicality of the standard when the standardizing authority doesn’t
    > comply.

    Doesn't "comply" with what? As I noted above, the Unicode Standard is
    not about specifying orthographic norms. And the standardizing authority
    for HTML is W3C, not the Unicode Consortium.

    >
    > > There is this problem in ukrainian language, where apostrophe means hard sign.
    > > How to reproduce it in original cyrillic script? It would not be a "diacritic"
    > > character as apostrophe, but it is really the original cyrillic character at
    > > the moment (The Ukrainian National Library thake it as an apostrophe U+0027).
    >
    > > Same as in the Latin script: U+2019
    > > http://www.unics.uni-hanover.de/nhtcapri/cyrillic-script.html5
    >
    > Why? This seems to conflict with the standard as I understand it. I believe
    > it’s a letter with a phonological value, not a punctuation mark, so I
    > understand the standard to state that the correct character is 02BC (MODIFIER
    > LETTER APOSTROPHE). I believe that this is argued for at
    > http://linux.org.ua/cgi-bin/yabb/YaBB.pl?num=1189996822/75
    > in message 87. If I’m incorrect, I’d appreciate an explanation. Thanks.

    Followed in message 89 by a quotation from Unicode 4.1.0 about
    the distinctions (or non-distinctions) between U+0027, U+02BC,
    and U+2019 -- and we are back chasing our tails again. It really
    is the same set of arguments for every orthography that uses
    a raised comma-shaped "apostrophe" in one or more contexts. Does
    it systematically distinguish between letter and punctuation uses,
    and if so, in what contexts?

    --Ken



    This archive was generated by hypermail 2.1.5 : Mon May 19 2008 - 18:45:42 CDT