From: Uriah Eisenstein (uriaheisenstein@gmail.com)
Date: Tue Jul 06 2010 - 18:29:36 CDT
Regarding characters in the SIP, maybe the Unihan IICore field could be
useful? There are 62 Extension B characters which are listed as IICore. Of
course, these may just be characters which *should* be supported by
implementations, given that quite a lot of software has problems with
supplementary characters in general...
Uriah
On Tue, Jun 15, 2010 at 3:15 AM, Mark Davis ☕ <mark@macchiato.com> wrote:
> From a sampling of the web (about .7M docs), the most common supplementary
> characters are, curiously, private use. Top is [?] U+FEB85. For Han, the
> top few are: 𣿡, 𠀤, 𩇫, 𥑬, 𤥂, 𡛺, 𤎌, 𠜎,... There are also, oddly,
> some Gothic and Shavian characters.
>
> However, the data gets pretty noisy; it would take a bigger sample to get
> more reliable data.
>
> Mark
>
> — Il meglio è l’inimico del bene —
>
>
> On Mon, Jun 14, 2010 at 09:10, John H. Jenkins <jenkins@apple.com> wrote:
>
>> Some characters in the SIP are more common in Chinese written in the HK
>> SAR than any character in Extension A, either because they are Hong Kong
>> toponyms (or the like), or are Cantonese-specific. (My own analysis of text
>> on the Chinese Wikipediæ is that the most common are U+23D13, U+282E2,
>> U+28B4E, and U+2A568, which occur seven times each.)
>>
>> I imagine that the best data would come from Google.
>>
>> And there are some Web sites out there in Deseret and Shavian, as well.
>> (If nothing else, both Deseret and Shavian versions of xkcd are available.
>> I'm not aware of any Linear B translations.)
>>
>> On 2010/6/14, at 上午8:48, Frédéric Grosshans wrote:
>>
>> > Is there any data on the most commonly used characters which are not in
>> > BMP ?
>> >
>> > I have the impression that SMP characters are mainly used scholars
>> > (historic scripts and math symbols). However, I have no idea whether the
>> > SIP characters are mainly historical, or if they include not-so rare
>> > characters needed for name and/or chinese dialects.
>> >
>> > Frédéric Grosshans
>> >
>> >
>>
>>
>>
>>
>
This archive was generated by hypermail 2.1.5 : Tue Jul 06 2010 - 18:36:15 CDT