Re: Emoji and Annotation data

From: Takao Fujiwara <tfujiwar_at_redhat.com>
Date: Mon, 27 Jun 2016 14:34:59 +0900

Hi,

E.g. http://unicode.org/emoji/charts/emoji-list.html
  "😀" has the annotations of "face" and "grin".

The data is available in only the html files.

Fujiwara

On 06/27/16 14:16, Peter Edberg-san wrote:
> Fujiwara-san,
> If you follow the information indicated by UTR 51 (as Mark had suggested), you will see that:
>
> 1. The annotations data is available in CLDR here, in English:
> http://unicode.org/cldr/trac/browser/tags/latest/common/annotations/en.xml
> (or in many other languages, such as Japanese:)
> http://unicode.org/cldr/trac/browser/tags/latest/common/annotations/ja.xml
>
> The description of the format for those xml files is here:
> http://www.unicode.org/reports/tr35/tr35-general.html#Annotations
>
> 2. Other emoji data files are here:
> http://www.unicode.org/Public/emoji/latest/
>
> These data files are what drive the generation of the charts.
>
> Best regards,
> Peter Edberg
>
>
>
>> On Jun 26, 2016, at 9:09 PM, Takao Fujiwara <tfujiwar_at_redhat.com> wrote:
>>
>> On 06/25/16 01:04, Mark Davis ☕️-san wrote:
>>> You should never be scraping /any/ Unicode HTML files. They are not made for that, and there is no guarantee of stability.
>>
>> I cannot find the license or descriptions about the HTML files.
>>
>>>
>>> The emoji files are built from data which is described in http://www.unicode.org/reports/tr51/
>>> (plus CLDR annotations and collation)
>>
>> OK, I need the data which packages Emoji unicode and the annotation.
>> It would be great if the data could be provided besides the html files.
>>
>> Thanks,
>> Fujiwara
>>
>>>
>>> Mark
>>> //////
>>>
>>> On Fri, Jun 24, 2016 at 7:21 AM, Takao Fujiwara <tfujiwar_at_redhat.com <mailto:tfujiwar_at_redhat.com>> wrote:
>>>
>>> Hi,
>>>
>>> I'm working on IBus - the input method framework for Linux.
>>> I parse http://unicode.org/emoji/charts/emoji-list.html and create a dictionary between the annotations and the Emoji characters.
>>> Since the file size is large and it's often updated, I'm thinking how to maintain the file.
>>>
>>> I copied the file as http://ibus.github.io/files/ibus/emoji-list.html for the build at the moment.
>>>
>>> I have questions:
>>> - if unicode.org <http://unicode.org> provides the tarball of the stable html files or other data.
>>> - what is the license of the html files.
>>>
>>> Do you have any ideas?
>>>
>>> Thanks,
>>> Fujiwara
>>>
>>>
>>
>
>
Received on Mon Jun 27 2016 - 00:35:33 CDT

This archive was generated by hypermail 2.2.0 : Mon Jun 27 2016 - 00:35:33 CDT