Re: Normalization rate on the Web

From: Martin J. Dürst <duerst_at_it.aoyama.ac.jp>
Date: Tue, 22 Jan 2013 15:48:11 +0900

On 2013/01/22 1:12, Denis Jacquerye wrote:
> Does anybody have any idea of how much of the Web is normalized in NFC
> or NFD? Or how much not normalized?

I have never measured this. But at one time, there was only NFD (and
NFKD). The Unicode Consortium, with input from W3C, then defined NFC
(and NFKC) to be much closer to the actual encodings used on the Web.

So in some sense, Web Content is (mostly) NFC *by design*.

Regards, Martin.

> How would one find out or try to make a smart guess?
>
> I know a lot of library catalogue data is in NFD or somewhat
> decomposed. Is there any other field that heavily uses decomposition?
>
> --
> Denis Moyogo Jacquerye
> African Network for Localisation http://www.africanlocalisation.net/
> Nkótá ya Kongó míbalé --- http://info-langues-congo.1sd.org/
> DejaVu fonts --- http://www.dejavu-fonts.org/
>
>
>
Received on Tue Jan 22 2013 - 00:54:38 CST

This archive was generated by hypermail 2.2.0 : Tue Jan 22 2013 - 00:54:41 CST