Pre-proposal for SCSU updates

From: Doug Ewell (doug@ewellic.org)
Date: Mon Nov 01 2010 - 15:50:51 CST

  • Next message: srivas sinnathurai: "What is Phonemic"

    I'd like to try to gauge the community's interest, if any, in some
    possible updates to UTS #6 and the SCSU mechanism, as follows:

    (1) Updating the spec to add dynamic-window offsets 0xA8 through 0xBF,
    to permit encoding the blocks from U+A000 through U+ABFF in single-byte
    mode. This would allow the many small alphabets assigned to this range,
    such as Bamum and Syloti Nagri and Phags-Pa, to be encoded efficiently
    using SCSU. Other offsets could be added as well, such as for Hangul
    Jamo Extended-B.

    (2) Updating the spec to assign "reserved" tag bytes 0x0C (single-byte
    mode) and 0xF2 (Unicode mode) as "reset all" commands, similar to 0xFF
    in BOCU-1. This would allow more efficient encoding in some cases, as
    well as providing a possible synchronization mechanism for decoders. As
    an alternative, these unused tag bytes could be released for normal,
    non-reserved use, so they would no longer require escaping.

    (3) Providing an informational section in UTS #6 on "line-safe SCSU," a
    special-purpose SCSU encoding profile in which all state is returned to
    the default at the end of each line, and all lines are terminated with
    CR/LF.

    I'm aware that many people have been discouraging the use of SCSU
    altogether, on the basis of Web-page security concerns or the reputation
    of SCSU as "difficult to implement." These people will not be affected
    one way or another by any enhancements to SCSU, and I am not focusing on
    them at present.

    --
    Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
    RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s ­
    


    This archive was generated by hypermail 2.1.5 : Mon Nov 01 2010 - 15:54:13 CST