Re: Cp1256 (Windows Arabic) Characters not supported by UTF8

From: Tom Emerson (tree@basistech.com)
Date: Wed Aug 10 2005 - 14:21:53 CDT

Next message: Philippe Verdy: "Re: Cp1256 (Windows Arabic) Characters not supported by UTF8"

Previous message: eflarup@yahoo.com: "Re: Cp1256 (Windows Arabic) Characters not supported by UTF8"
In reply to: Philippe Verdy: "Re: Cp1256 (Windows Arabic) Characters not supported by UTF8"
Next in thread: Philippe Verdy: "Re: Cp1256 (Windows Arabic) Characters not supported by UTF8"
Reply: Philippe Verdy: "Re: Cp1256 (Windows Arabic) Characters not supported by UTF8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Philippe Verdy writes:
> This is absolutely not needed for a charset detector (i.e. the detection of
> the encoding used to serialize the text). HTML escapes are perfectly valid
> in HTML, and even if they refer to non Latin-1 characters, this does not
> change the fact that the page remains encoded in ISO-8859-1.
>
> You don't need to take HTML escapes into account with regards of which
> encoding is used, because these escapes are independant of the actual
> encoding used.

Agreed. But if you are interested in the langauge of the page as well
as the encoding, which some applications do care about, then you have
to take these into account. And, as I said, building a model that
accounts for language as well as encoding can help differentiate the
various Latin-n versions.

-- 
Tom Emerson                                          Basis Technology Corp.
Software Architect                                 http://www.basistech.com
 "You can't fake quality any more than you can fake a good meal." (W.S.B.)

Next message: Philippe Verdy: "Re: Cp1256 (Windows Arabic) Characters not supported by UTF8"
Previous message: eflarup@yahoo.com: "Re: Cp1256 (Windows Arabic) Characters not supported by UTF8"
In reply to: Philippe Verdy: "Re: Cp1256 (Windows Arabic) Characters not supported by UTF8"
Next in thread: Philippe Verdy: "Re: Cp1256 (Windows Arabic) Characters not supported by UTF8"
Reply: Philippe Verdy: "Re: Cp1256 (Windows Arabic) Characters not supported by UTF8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Aug 10 2005 - 14:22:41 CDT