Re: Most commonly used characters not in BMP

From: Mark Davis ☕ (
Date: Mon Jun 14 2010 - 20:15:00 CDT

  • Next message: Tulasi: "Re: Latin Script"

    From a sampling of the web (about .7M docs), the most common supplementary
    characters are, curiously, private use. Top is [?] U+FEB85. For Han, the top
    few are: 𣿡, 𠀤, 𩇫, 𥑬, 𤥂, 𡛺, 𤎌, 𠜎,... There are also, oddly, some
    Gothic and Shavian characters.

    However, the data gets pretty noisy; it would take a bigger sample to get
    more reliable data.


    — Il meglio è l’inimico del bene —

    On Mon, Jun 14, 2010 at 09:10, John H. Jenkins <> wrote:

    > Some characters in the SIP are more common in Chinese written in the HK SAR
    > than any character in Extension A, either because they are Hong Kong
    > toponyms (or the like), or are Cantonese-specific. (My own analysis of text
    > on the Chinese Wikipediæ is that the most common are U+23D13, U+282E2,
    > U+28B4E, and U+2A568, which occur seven times each.)
    > I imagine that the best data would come from Google.
    > And there are some Web sites out there in Deseret and Shavian, as well.
    > (If nothing else, both Deseret and Shavian versions of xkcd are available.
    > I'm not aware of any Linear B translations.)
    > On 2010/6/14, at 上午8:48, Frédéric Grosshans wrote:
    > > Is there any data on the most commonly used characters which are not in
    > > BMP ?
    > >
    > > I have the impression that SMP characters are mainly used scholars
    > > (historic scripts and math symbols). However, I have no idea whether the
    > > SIP characters are mainly historical, or if they include not-so rare
    > > characters needed for name and/or chinese dialects.
    > >
    > > Frédéric Grosshans
    > >
    > >

    This archive was generated by hypermail 2.1.5 : Mon Jun 14 2010 - 20:18:18 CDT