Re: Unicode forms for internal storage

From: Elliotte Rusty Harold (elharo@metalab.unc.edu)
Date: Tue Jan 20 2004 - 14:45:12 EST

  • Next message: Elliotte Rusty Harold: "RE: Unicode forms for internal storage"

    At 9:52 AM -0800 1/20/04, Markus Scherer wrote:
    >You need not invent something new: Just use a simplified SCSU
    >encoder, and either a regular SCSU decoder or one that only supports
    >the features which your custom encoder uses.

    Thanks. It looks like exactly what I need.

    >For a tiny SCSU encoder (main function 75 lines of commented C) that
    >also compresses a little better than what you describe see
    >http://www.mindspring.com/~markus.scherer/unicode/tr6/
    >
    >You could scale that encoder up or down to your liking.
    >
    >For a full SCSU converter you could use ICU, for example.
    >http://oss.software.ibm.com/icu/

    Hmm, I'm already carrying around part of ICU4J to perform
    normalization. I'll have to check and see if I've got the SCSU
    support compiled into my version of the ICU jar.

    >You could also use BOCU-1.

    Reading the BOCU tech note, it looks like SCSU performs better, The
    main benefit of BOCU is if you're transmitting this encoding on the
    wire, which I am definitely not doing. But SCSU looks like a really
    nice option. Thanks.

    -- 
       Elliotte Rusty Harold
       elharo@metalab.unc.edu
       Effective XML (Addison-Wesley, 2003)
       http://www.cafeconleche.org/books/effectivexml
       http://www.amazon.com/exec/obidos/ISBN%3D0321150406/ref%3Dnosim/cafeaulaitA
    


    This archive was generated by hypermail 2.1.5 : Tue Jan 20 2004 - 16:33:05 EST