michaelanortonster at gmail.com
Fri Mar 27 15:16:06 CDT 2015
(I know this is way too simplistic a response but it is kind of like giving
everyone an invisible cloak and an invisible dagger and not telling them
what a cloak and dagger is for [cutting butter & keeping warm]).
On Fri, Mar 27, 2015 at 3:57 PM, Michael Norton <
michaelanortonster at gmail.com> wrote:
> Why wouldn't Unicode itself have it?
> On Fri, Mar 27, 2015 at 1:07 PM, Ken Whistler <kenwhistler at att.net> wrote:
>> Search engine companies (and in particular, Google) have such
>> information squirreled away in their index databases, at least as
>> far as usage stats for Unicode characters on the web go -- but it
>> is proprietary information, and they generally don't publish
>> information about such statistics.
>> Perhaps there are researchers out there who have set web crawlers
>> on a mission to generate such web statistics for publication, and maybe
>> somebody on this list knows of such research -- but it would be
>> virtually impossible to generate such information for the much
>> wider collection of documents and data that are not easily accessible
>> for web indexing. (Behind password walls, in pdf document archives,
>> in proprietary databases, ... ) As an example of why this is a problem,
>> consider the fact that there are *peta*bytes of information picked up
>> and stored in databases from scanners and other devices used at
>> tens of millions of retail points of sale. Such data, by its nature,
>> would tend
>> to skew heavily towards use of ASCII a-z and digits 0-9 in its
>> character data. How would you end up weighting such (mostly
>> publicly inaccessible) data in trying to count up for overall statistics
>> on character use?
>> There are more traditional usage count studies that focus on
>> counts of character frequency within single language orthographies
>> in single scripts (e.g., letter frequences for French text), but I don't
>> think that is what you were asking about.
>> Here is some discussion of a similar question posted on stackoverflow:
>> On 3/27/2015 9:31 AM, Michael Norton wrote:
>>> Hello and thank you for an incredible service (just joining the list).
>>> Is there a list of usage statistics per character of the Unicode set
>>> available somewhere?
>> Unicode mailing list
>> Unicode at unicode.org
> Michael A. Norton, B.A. Cinema, M.P.A.
> My Cinema Home: http://www.NortonsNook.com
> "All great actors are mere mathematical masters of speech and the human
Michael A. Norton, B.A. Cinema, M.P.A.
My Cinema Home: http://www.NortonsNook.com
"All great actors are mere mathematical masters of speech and the human
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Unicode