Re: detecting case context

From: Markus Scherer (markus.icu@gmail.com)
Date: Fri Mar 25 2005 - 13:35:50 CST

  • Next message: Donald Z. Osborn: "Re: Languages of the world"

    On Thu, 24 Mar 2005 15:32:09 +0100, Theo Veenker <Theo.Veenker@let.uu.nl> wrote:
    > The descriptions for Final_Sigma and Before_Dot are clear to me. For
    > After_Soft_Dotted, More_Above and After_I don't see how the descriptions
    > and the regexps represent *exactly* the same thing. For these I don't
    > see the \p{cc=0} parts reflected in the descriptions. Also isn't the
    > After_I regexp missing a "*"?

    You mention another typo below. Looking at the current version, I
    don't see the typos, and I see phrases like "with no intervening
    character of type 0" corresponding to the \p{cc=0} parts.
    http://www.unicode.org/versions/Unicode4.1.0/

    > The functions below represent what I make of the descriptions and the
    > regexps. Are they correct?

    I just took a very brief look at some of them, and they look ok. Feel
    free to compare with my implementation in ICU. In the current version,
    it's in ucase.c, roughly the second half of the file.

    Our site recently moved - WebCVS has this currently at
    http://dev.icu-project.org/cgi-bin/viewcvs.cgi/*checkout*/icu/source/common/ucase.c
    but WebCVS may move once more. You can also just download ICU 3.2 or
    use anonymous CVS. See http://www.ibm.com/software/globalization/icu/

    In older ICU releases, very similar code was in uchar.c. The code
    comments quote an older Unicode version, but the conditions have not
    substantially changed since then.

    markus



    This archive was generated by hypermail 2.1.5 : Fri Mar 25 2005 - 13:37:44 CST