From: Mark Davis ☕ (mark@macchiato.com)
Date: Wed Oct 06 2010 - 10:49:14 CDT
ICU has a canonical iterator, one that provides all the strings that produce
the same result under toNFC(...).
Mark
*— Il meglio è l’inimico del bene —*
On Mon, Oct 4, 2010 at 20:59, Bjoern Hoehrmann <derhoermi@gmx.net> wrote:
> Hi,
>
> Every now and then I need a tool that takes a Unicode string and gives
> me all the strings that are not identical but equivalent under one of
> the four normalization forms defined in UAX #15. Now I do have a couple
> of hacks that get me by, but is there any tool or paper that has a more
> complete solution? Last year I worked a bit in the general direction,
> but http://lists.w3.org/Archives/Public/www-archive/2009Feb/0071.html I
> ran out of time after proving that the sets of strings in one of the
> normal forms are all regular languages, and writing a denormalizer was
> not the goal anyway.
>
> Thanks,
> --
> Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
> Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
> 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
>
>
This archive was generated by hypermail 2.1.5 : Wed Oct 06 2010 - 10:53:45 CDT