Searching for CJK etc characters on electronic media

From: Smith, Mike (SMITHM@fish.govt.nz)
Date: Sun Dec 15 2002 - 14:54:58 EST

  • Next message: Avarangal: "Documenting in Tamil Computing"

    Hello

    I frequently need to search computer storage media for words in
    languages such as chinese, japanese, korean, russian etc. Currently I
    have been using tools that primarily display the computer values as
    ASCII or hex. The search tool has no display capability for unicode
    values (or glyphs).

    When I re-order the UTF-16 value to hex (ie flip the bits) I get a large
    number of false positive hits on the hex values. Further, when I look
    at the surrounding hex values to ascertain the context of the keyword
    'hit' I am finding that it is extremely difficult to deduce a meaningful
    context.

    Can anyone assist me with an approach and or tools that can assist with
    reading and searching computer media that contains CJK etc characters?

    Thanks in advance

    Mike Smith



    This archive was generated by hypermail 2.1.5 : Sun Dec 15 2002 - 15:28:30 EST