Re: Identifying file encoding scheme

From: John Cowan (cowan@locke.ccil.org)
Date: Wed Sep 08 1999 - 17:17:57 EDT


Montgomery Securities scripsit:

> How does a software package written on an operating system that supports
> ASCII as well as Unicode (Windows NT) identify the encoding scheme that a
> text file on disk uses? Is there any special marking at the front of a
> Unicode file that helps distinguish it from an 8 bit file?

The specific answer for Notepad on Windows NT 4.0 is that if the first
two bytes are FF FE, then the file is assumed to be (little-endian)
Unicode. Otherwise, it is assumed to be in the current system code page,
typically CP1252.

It is possible for a Latin-1 file to begin with
y-diaeresis followed by thorn by sheer bad luck, but it is most
unlikely.

-- 
John Cowan                                   cowan@ccil.org
       I am a member of a civilization. --David Brin



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT