Re: FYI: Google blog on Unicode

From: Mark Davis ☕ (mark@macchiato.com)
Date: Fri Jan 29 2010 - 11:34:47 CST

Next message: Mark Davis ☕: "Re: FYI: Google blog on Unicode"

Previous message: Jonathan Rosenne: "RE: FYI: Google blog on Unicode"
In reply to: karl williamson: "Re: FYI: Google blog on Unicode"
Next in thread: Mark Davis ☕: "Re: FYI: Google blog on Unicode"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

We separate out pure ASCII pages because they are a subset of most
encodings on the web (Latin 1, etc). That way the graph is a pure
partition.

If we didn't, the UTF-8 pages amount to > 65%, but then the
Latin1/cp1252 are > 40%, SJIS is > 20%, etc. which would be
misleading.

Mark

On Fri, Jan 29, 2010 at 08:44, karl williamson <public@khwilliamson.com> wrote:
> Mark Davis ☕ wrote:
>>
>> FYI, they managed to use the larger image before most people saw it.
>>
>> Mark
>>
>>
>>
>> On Fri, Jan 29, 2010 at 07:06, Mark Davis ☕ <mark@macchiato.com> wrote:
>>>
>>> It is encodings determined by a detection algorithm. The declarations
>>> for encodings (and language) are far too unreliable to be depended on.
>>> The detection algorithm itself is fairly complex, but quite fast and
>>> compact.
>>>
>>> Mark
>>>
>>>
>>>
>>> On Thu, Jan 28, 2010 at 21:38, Simon Montagu <smontagu@smontagu.org>
>>> wrote:
>>>>
>>>> On 28/01/2010 10:50, Mark Davis ☕ wrote:
>>>>>
>>>>> There's a blog on Unicode that people may find interesting:
>>>>> http://googleblog.blogspot.com/2010/01/unicode-nearing-50-of-web.html
>>>>>
>>>>> (The graph on Unicode is too small; until they get that fixed, I have
>>>>> the large one on http://www.macchiato.com/)
>>>>>
>>>>> Mark
>>>>
>>>> What exactly is this counting? Encodings declared internally in
>>>> web-pages?
>>>> Encodings declared in HTTP headers? Encodings determined by
>>>> auto-detection?
>>>> Some combination of the above?
>>>>
>>>> --
>>>> Simon Montagu
>>>> Mozilla internationalization
>>>> סיימון מונטגיו
>>>>
>>>>
>>
>>
>
> Since ASCII is a proper subset of utf8, this means effectively that 2/3 of
> the web is using utf8; up from about 57% in 2001. So the sum of the two has
> a much shallower slope.
>
> Since the two are distinguished, I'm guessing that many more web pages have
> at least one non-ascii character on them than there used to be??
>
>

Next message: Mark Davis ☕: "Re: FYI: Google blog on Unicode"
Previous message: Jonathan Rosenne: "RE: FYI: Google blog on Unicode"
In reply to: karl williamson: "Re: FYI: Google blog on Unicode"
Next in thread: Mark Davis ☕: "Re: FYI: Google blog on Unicode"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Jan 29 2010 - 11:37:18 CST