Re: FYI: Google blog on Unicode

From: karl williamson (public@khwilliamson.com)
Date: Fri Jan 29 2010 - 10:44:45 CST

  • Next message: Jonathan Rosenne: "RE: FYI: Google blog on Unicode"

    Mark Davis ☕ wrote:
    > FYI, they managed to use the larger image before most people saw it.
    >
    > Mark
    >
    >
    >
    > On Fri, Jan 29, 2010 at 07:06, Mark Davis ☕ <mark@macchiato.com> wrote:
    >> It is encodings determined by a detection algorithm. The declarations
    >> for encodings (and language) are far too unreliable to be depended on.
    >> The detection algorithm itself is fairly complex, but quite fast and
    >> compact.
    >>
    >> Mark
    >>
    >>
    >>
    >> On Thu, Jan 28, 2010 at 21:38, Simon Montagu <smontagu@smontagu.org> wrote:
    >>> On 28/01/2010 10:50, Mark Davis ☕ wrote:
    >>>> There's a blog on Unicode that people may find interesting:
    >>>> http://googleblog.blogspot.com/2010/01/unicode-nearing-50-of-web.html
    >>>>
    >>>> (The graph on Unicode is too small; until they get that fixed, I have
    >>>> the large one on http://www.macchiato.com/)
    >>>>
    >>>> Mark
    >>> What exactly is this counting? Encodings declared internally in web-pages?
    >>> Encodings declared in HTTP headers? Encodings determined by auto-detection?
    >>> Some combination of the above?
    >>>
    >>> --
    >>> Simon Montagu
    >>> Mozilla internationalization
    >>> סיימון מונטגיו
    >>>
    >>>
    >
    >

    Since ASCII is a proper subset of utf8, this means effectively that 2/3
    of the web is using utf8; up from about 57% in 2001. So the sum of the
    two has a much shallower slope.

    Since the two are distinguished, I'm guessing that many more web pages
    have at least one non-ascii character on them than there used to be??



    This archive was generated by hypermail 2.1.5 : Fri Jan 29 2010 - 10:48:20 CST