Re: Exemplar Characters

From: Mark Davis (mark.davis@icu-project.org)
Date: Mon Nov 14 2005 - 21:21:45 CST

  • Next message: David Starner: "Re: Some Missing Astrological Symbols"

    > For example, Breton has {ch} and {c'h} but no isolated {c}. ...

    CLDR uses the correct orientation for apostrophes. It also contains
    mapping information so that someone wanting to use or allow fallback
    characters such as the ASCII apostrophe can do so. So {c’h} would be
    used. (I assume that is the punctuation character, not the letter modifier.)

    > often with an apostrophe or single quote followed by Epsilon...
    Also would be covered by fallbacks, not encoded in CLDR data.

    > There are similar examples in other languages, like {’n} or {'n} for
    which there also exists a combined character in Unicode...
    This case I'm not so sure about. While there is a combined character in
    Unicode, I don't know that we should recommend its usage over the sequence.

    Philippe Verdy wrote:

    > From: "Mark Davis" <mark.davis@icu-project.org>
    >
    >> We use NFC for the exemplar character set. Any significant character
    >> sequence can be included as well. For example, one can have [a-h {ch}
    >> i-z], which indicates that {ch} is treated as a unit.
    >
    >
    > What about apostrophes? They are present in some sigraphs and
    > trigraphs for some languages.
    >
    > For example, Breton has {ch} and {c'h} but no isolated {c}. One
    > problem is: howcan we represent the various ways to encode the
    > apostrophe: {c'h} is frequent for Breton (using the ASCII single
    > quote), but the correct code should be {c’h} (using the upper-comma
    > apostrophe).
    >
    > How can we say that these two should be treated equivalently. May be
    > this one {c['’]h} ???
    >
    > Although the correct form should be with the apostrophe, the ASCII
    > quote is MUCH more frequent (also true for French and English, however
    > the apostrophe plays another role and they don't create unbreakable
    > digraphs/trigraphs like in Breton where the three characterssequence
    > is a SINGLE letter...)
    >
    > There are similar examples in other languages, like {’n} or {'n} for
    > which there also exists a combined character in Unicode...
    >
    > Also in Greek, {’Ε} is interpreted as capital Epsilon with tonos which
    > can also be found encoded as a spacing tonos character before the
    > capital letter {΄Ε} or often with an apostrophe or single quote
    > followed by Epsilon...
    >
    > Philippe.
    >
    >
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Mon Nov 14 2005 - 21:23:32 CST