UnicodeData.txt is invalid, flawed, broken, corrupt and wrong

From: Theodore H. Smith (delete@elfdata.com)
Date: Sat Jun 11 2005 - 14:27:59 CDT

  • Next message: Aki Inoue: "Re: UnicodeData.txt is invalid, flawed, broken, corrupt and wrong"

    No one from the official Unicode.org company replied to me last time,
    so I'll try again.

    Why is it that the entry for Kelvin (a measurement of temperature),
    has a decomposition, which is listed as a canonical decomposition, to
    the standard ASCII "K"?

    This decomposition is actually a compatibility decomposition.

    How does this cause me problems? I've written a parser for
    UnicodeData.txt. This parser will extract data for decomposition, and
    for composition also.

    Because Kelvin canonically decomposes to K, it follows that K
    cannonically composes to Kelvin! :o(

    So my composer will change a word like this: "Kitchen", into "(Kelvin)
    itchen". Which is just totally wrong. All because UnicodeData.txt is
    broken.

    That is what I think. But I might be wrong.

    Can someone from Unicode.org please confirm or deny all of this? That
    will put my mind at rest, because I need the official answer.

    --
    http://elfdata.com/plugin/ Industrial strength string processing,  
    made easy.
    "All things are logical. Putting free-will in the slot for premises in
    a logical system, makes all of life both understandable, and free."
    


    This archive was generated by hypermail 2.1.5 : Sat Jun 11 2005 - 14:29:36 CDT