From: Ben Dougall (bend@freenet.co.uk)
Date: Sat May 10 2003 - 09:10:25 EDT
> >It would appear to be a three step process:
> >
> >(1) First, detect ...
> >(2) Second, compare ...
> >(3) Third, ... test
>
> (4) Give the user a chance to correct your program's guess -- some
> users actually know!
this is all very useful information, including the details of it, and
the emacs related info (will follow that up definitely) - thanks very
much.
what should the default be though? post encoding detection, post fuzzy
logic, post whatever other tricks, pre giving the user a chance to
change it themselves: still don't know. so how should that particular
decision be made (while knowing the user's main language)?
if the user's main language was any latin based one - 8bit extended
ascii would be the obvious one.
but what if the user's main language is one based on a character set
other than latin? would falling back to a character set other than
extended ascii be in order in those cases? if so which basic character
bases are there other than ascii? - i'm guessing there's not going to
be many basic character bases (viewing ascii as the one for latin based
scripts). OR should it not fall back to an alternative to extended
ascii? but just fall back to 8bit ascii as default regardless of
language setting?
This archive was generated by hypermail 2.1.5 : Sat May 10 2003 - 10:39:10 EDT