CJK test data

From: Erik.Ostermueller@alltel.com
Date: Thu Feb 06 2003 - 12:40:59 EST

  • Next message: Erik.Ostermueller@alltel.com: "RE: discovering code points with embedded nulls"

    I'm starting to put together some CJK test data
    as described below.

    Before I dive in, I was curious if any of this
    work is already available on the web.
    If not, would others be interested seeing this,
    once complete?

    ###############################################################
    CJK Test data.
    This is just a start!

        Need to produce a set of CJK data that is geared towards
        testing string manipulation support in any software system.
        The intent of the data would be to test software systems,
        regardless of platform, software language or even API.

        All data need english translations and instructions for
        entering the data using an IME on a QWERTY keyboard.

        Need tests to prove that a system SUPPORTS GB 18030
        Need tests to prove that a system SUPPORTS GB 13000
        Need tests to prove that a system DOES NOT support GB 18030
        Need tests to prove that a system DOES NOT support GB 13000

        Tests: need two sets of data, on for 13000, one for 18030
          1) Sorting Test
            a) include a list of un-ordered strings.
            b) follow that with the same list, ordered properly.
      
          2) Text searching
            -Need single character search and multiple character search.
             Must include the 'key' that we're looking for and
              strings that do and do not contain that key.

          3) Character classification
              We need data to test some subset of the predicate functions: isSpace(), isAlpha(), is*():



    This archive was generated by hypermail 2.1.5 : Thu Feb 06 2003 - 14:02:27 EST