From: Nelson H. F. Beebe (beebe@math.utah.edu)
Date: Sat May 10 2003 - 08:30:21 EDT
Ben Dougall <bend@freenet.co.uk> asks on 10 May 2003 10:56 about the
problem of recognizing the encoding of untagged plain text.
This note is to point out that considerable work has already been done
on this in the MULE (multi-lingual emacs) extensions to the GNU Emacs
text editor; they are available in the separate leim-x.y.z
distribution at
since 17-Sep-1997 for emacs-20.1 and later. Simply put, if in an
otherwise-empty directory, you do
tar xfz emacs-x.y.z.tar.gz
tar xfz leim-x.y.z.tar.gz
cd emacs-x.y.z
./configure && make all check install
you'll get an emacs with MULE support.
Visiting a text file causes emacs to apply heuristics to the visited
text to guess an encoding. Should it guess incorrectly, the human
user can then change the encoding with a few keystrokes or menu
selections.
All of the source code is available for study.
GNU emacs builds on virtually any UNIX platform, except Mac OS X:
Apple distributes a version without X11 support, but seems not to have
returned their changes to the emacs developers. There is also a
version for several flavors of MS/Windows, both native, and under
Unix-like environments that run on top of that system: see
http://www.math.utah.edu/~beebe/gnu-on-windows.html
for details.
-------------------------------------------------------------------------------
- Nelson H. F. Beebe Tel: +1 801 581 5254 -
- Center for Scientific Computing FAX: +1 801 581 4148 -
- University of Utah Internet e-mail: beebe@math.utah.edu -
- Department of Mathematics, 110 LCB beebe@acm.org beebe@computer.org -
- 155 S 1400 E RM 233 beebe@ieee.org -
- Salt Lake City, UT 84112-0090, USA URL: http://www.math.utah.edu/~beebe -
-------------------------------------------------------------------------------
This archive was generated by hypermail 2.1.5 : Sat May 10 2003 - 09:03:15 EDT