Re: Wanted: An Internet Unicode Meter

From: Don Osborn (dzo@bisharat.net)
Date: Thu Jul 27 2006 - 13:08:44 CDT

  • Next message: JFC Morfin: "Re: New to Unicode"

    Thank you Debbie. I also found another document on LOP from a conference
    last year at http://www2005.org/cdrom/docs/p990.pdf . It has some additional
    specs.

    ----- Original Message -----
    From: Debbie Garside
    ...

    > Having had some contact with Professor Mikami, a brief outline to the best
    > of my knowledge, the Language Observatory project (Nagaoka University -
    > Japan) employs a UBI Crawler to trawl the web gathering information on
    > language, scripts, encodings etc. The project aims to analyse this data
    > providing stats on the coverage of the 300 languages as used in the
    > Declaration of Human Rights, amongst other things.
    >
    > http://www.elda.org/en/proj/scalla/SCALLA2004/mikami.pdf
    >
    > Amharic is one of those languages.
    >
    > Regards
    >
    > Debbie Garside
    >
    >> -----Original Message-----
    >> From: unicode-bounce@unicode.org
    >> [mailto:unicode-bounce@unicode.org] On Behalf Of Don Osborn
    >> Sent: 26 July 2006 21:37
    >> To: unicode@unicode.org; Daniel Yacob
    >> Cc: Tunde Adegbola; a12n-forum@bisharat.net
    >> Subject: Re: Wanted: An Internet Unicode Meter
    >>
    >> Hi Daniel, There was a workshop in Bamako last month
    >> sponsored by the African Academy of Languages, the Language
    >> Observatory, and the Japan Science and Technology Agency (see
    >> http://gii2.nagaokaut.ac.jp/giiblog/blog/lopdiary.php?itemid=7
    >> 15 ) which dealt with surveying African languages on the web.
    >> One of the things they did, according to Tunde Adegbola (who
    >> was there and you will recall from the Casablanca workshop
    >> last year) was to introduce something called the Language
    >> Identification Module (LIM).
    >>
    >> This might answer the first question. I'll cc Tunde and I
    >> need to write Yoshiki Mikami and Shigeaki Kodama about it
    >> anyway. I'll also cc A12n-forum where this subject came up
    >> before - in the interests of broadening the info & dialogue
    >> on what sounds to be a project of wider interest that has had
    >> relatively little attention.
    >>
    >> All the best.
    >>
    >> Don
    >>
    >> Don Osborn
    >> Bisharat.net
    >> PanAfrican Localisation Project
    >>
    >>
    >> ----- Original Message -----
    >> From: "Daniel Yacob" <unicode@geez.org>
    >> To: <unicode@unicode.org>
    >> Sent: Wednesday, July 26, 2006 1:01 PM
    >> Subject: Wanted: An Internet Unicode Meter
    >>
    >>
    >> > Greets,
    >> >
    >> > I was asked twice within a week recently how many Amharic documents
    >> > were on the internet and I could only guess at a figure. So it
    >> > dawned on me that it would be a nice service if search engine
    >> > companies could provide some statistics -based on language (if
    >> > identified) and script. Perhaps these stats are available and
    >> > I just wasn't able to find them?
    >> >
    >> > Going a step further, stats on a per character basis, or even a
    >> > property basis would be useful and not just academically
    >> interesting.
    >> > The practical application that comes to mind would be as a survey
    >> > of Unicode usage. Under-utilized blocks, even dead zones, could be
    >> > identified which would indicate where community outreach was needed.
    >> >
    >> > I think this would be in the Unicode Consortium's best interest to
    >> > be aware of these stats (as well as related stats such as Unicode
    >> > use vs other encoding systems and growth over time) to then know
    >> > where to focus efforts in promoting adoption of the standard.
    >> >
    >> > So if the Unicode Consortium could work on a character meter with a
    >> > major indexing/searching service, such as Google for example, that
    >> > would be dandy. Do we know anyone at that intersection? ;-)
    >> >
    >> > cheers,
    >> >
    >> > /Daniel
    >> >
    >> >
    >>
    >>
    >>
    >>
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Thu Jul 27 2006 - 14:01:19 CDT