Re: UTF-c

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sat Feb 26 2011 - 13:09:27 CST

  • Next message: Philippe Verdy: "Re: UTF-c"

    I've not described there multiple bases. BASE is a single integer variable.

    There's no "BASE*" defined there for 1-byte encoding in the range 0x00..0x7F.

    The use of other bases is possible as an extension (I described it
    later when introducing BASE2 as a possible extension for the 2-byte
    encoding).

    > When a byte starting 11 is used in isolation, why is it represented as 11.yyxxxx please?
    >
    > Is it because there are four possible values of BASE, namely BASE[0], BASE[1], BASE[2] and BASE[3]?
    >
    > If BASE has a non-negative value less than 0x80, could that value of BASE be used to signal accessing a decoding tree so that the most common codepoints in the text from beyond the range U+0000 to U+007F could be represented using a single byte starting with 11? The contents of the decoding tree could be dynamically altered using switching codes.
    >
    > If the idea of four values for BASE, in BASE[0], BASE[1], BASE[2] and BASE[3] is used, then access to a decoding tree would be possible simultanwously with one-byte access to a contiguous block of other Unicode characters if so desired, though if  BASE[0], BASE[1], BASE[2] and BASE[3] are used the range of possible values of BASE would need to be 17 bits.
    >
    > For example, at some particular time in some particular application of the format, BASE[0] might have a value of 0x00 and BASE[1] might have a value of 0x100.



    This archive was generated by hypermail 2.1.5 : Sat Feb 26 2011 - 13:14:57 CST