From: Jeroen Ruigrok van der Werven (asmodai@in-nomine.org)
Date: Fri Jan 29 2010 - 02:57:45 CST
-On [20100129 06:53], Simon Montagu (smontagu@smontagu.org) wrote:
>What exactly is this counting? Encodings declared internally in
>web-pages? Encodings declared in HTTP headers? Encodings determined by
>auto-detection? Some combination of the above?
The article states: "This graph is from Google internal data, based on our
indexing of web pages, and thus may vary somewhat from what other search
engines find."
As we all know, there's a lot of pages that are either using a wrong
encoding in the preamble or in the headers. So I guess Google uses some
simple algorithm that looks at what the page says, what the server says and
whether or not that matches whatever it encounters on the page itself and
adjusts it as necessary.
Would not make much sense to store mojibake.
-- Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai イェルーン ラウフロック ヴァン デル ウェルヴェン http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B Though this be madness, yet there is a method in it...
This archive was generated by hypermail 2.1.5 : Fri Jan 29 2010 - 03:01:43 CST