From: Doug Ewell (doug@ewellic.org)
Date: Sun Feb 22 2009 - 20:13:21 CST
Mark Davis wrote:
> an illustrative sample simulating documents would be
>
> simulating content:
>
> 999,800 characters (82% being ASCII, then Cyrillic, Han, Arab, other
> Latin, ...) not needing normalization, and
>
> 200 characters needing normalization,
If you did happen to run into some data that started out in NFD -- say,
generated on a Mac -- you'd have a lot more than 0.02% of content
characters needing normalization.
-- Doug Ewell * Thornton, Colorado, USA * RFC 4645 * UTN #14 http://www.ewellic.org http://www1.ietf.org/html.charters/ltru-charter.html http://www.alvestrand.no/mailman/listinfo/ietf-languages ˆ
This archive was generated by hypermail 2.1.5 : Sun Feb 22 2009 - 20:16:26 CST