Hi Denis,
A fea thoughts ... library data may be nfc or nfd, but is more likely to
conform to the MARC character repetoire, so isn't exactly NFD.
Vietnamese data is either 1) NFC or 2) neither NFC nor NFD
It would be rare to find vietnamese data in NFD
For a range of afrjcan languages, maily ones uskng diacriti s anx diacritic
stackkng, it may be 1) NFC, 2) NFD or 3) niether NFC nor NFD depending on
the input framework used.
On Jan 22, 2013 3:26 AM, "Denis Jacquerye" <moyogo_at_gmail.com> wrote:
> Does anybody have any idea of how much of the Web is normalized in NFC
> or NFD? Or how much not normalized?
>
> How would one find out or try to make a smart guess?
>
> I know a lot of library catalogue data is in NFD or somewhat
> decomposed. Is there any other field that heavily uses decomposition?
>
> --
> Denis Moyogo Jacquerye
> African Network for Localisation http://www.africanlocalisation.net/
> Nkótá ya Kongó mÃbalé --- http://info-langues-congo.1sd.org/
> DejaVu fonts --- http://www.dejavu-fonts.org/
>
>
>
Received on Mon Jan 21 2013 - 18:49:25 CST
This archive was generated by hypermail 2.2.0 : Mon Jan 21 2013 - 18:49:25 CST