Re: "scripting units" vs "scripting bits"

From: Jon Hanna (jon@hackcraft.net)
Date: Thu Feb 25 2010 - 05:26:17 CST

  • Next message: verdy_p: "re: "scripting units" vs "scripting bits""

    spir wrote:
    > Hello,
    >
    >
    > I read somewhere, and some time ago(*), that the Unicode concept of character matches the common sense of "character" in computing. I find this assertion rather amazing,

    It's not just true, it's tautologous. Unicode is a standard for dealing
    with the concept of "character" in computing. As such, it inherently
    matches the concept of "character" in computing. It is true though that
    some things that have been done with characters in computing are not
    allowed in Unicode, so it's a proper subset of how character can be
    understood in computing; excluding those disallowed techniques.

    > For instance, in Unicode, the unit 'â' may be formed out of the bits 'a' and the composing variant of '^'.

    It cannot however be formed by 'a' + backspace + '^'. This was how â was
    produced by some ASCII-using systems. There are lots of ways one could
    possibly make â, but only two of them are allowed in Unicode.

    > It seems to me in legacy characters sets scripting bits simply do not exist, but I may be wrong on this.

    You are. Windows-1258 has combining characters (though â is precomposed
    in it, but there are four combining marks).



    This archive was generated by hypermail 2.1.5 : Thu Feb 25 2010 - 05:30:41 CST