RE: Algorithm

From: Hart, Edwin F. (Edwin.Hart@jhuapl.edu)
Date: Fri Mar 26 1999 - 14:35:22 EST


Look for the Byte Order Mark (BOM) or the byte-swapped BOM as the first 2
bytes of a file with Unicode text.

Note, however, that Unicode files are not required to have the BOM at the
beginning.

Ed Hart

Edwin F. Hart
Applied Physics Laboratory
11100 Johns Hopkins Road
Laurel, MD 20723-6099
+1-240-228-6926 (from Washington, DC area)
+1-443-778-6926 (from Baltimore area)
+1-240-228-1093 (fax)
edwin.hart@jhuapl.edu <mailto:edwin.hart@jhuapl.edu>

        -----Original Message-----
        From: Gnanesh Gujulva
        Sent: 26 March, 1999 12:54
        To: Unicode List
        Subject: FW: Algorithm

                I am working on a Java application which should handle both
ascii
        text files and unicode files. Is there a genralised algorithm to
detect the
        type of character set being used? I need to detect whether the
character
        set is plain ascii or Unicode.
> Regards
> Gnanesh



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:44 EDT