Re: UTF-c

From: William_J_G Overington (wjgo_10009@btinternet.com)
Date: Sat Feb 26 2011 - 03:17:51 CST

  • Next message: William_J_G Overington: "Re: UTF-c"

    Philippe Verdy <verdy_p@wanadoo.fr> wrote:
     
    > Note that the scalar values range 0xD800..0xDFFF reserved for surrogates code points MUST be excluded to be a conforming UTF (these code points must not be representable, to allow full bidirectional compatibility with UTF-16 ; this is unlike all other codepoints assigned to non-characters which SHOULD still be representable).
     
    How do you arrive at the conclusion about the surrogates please?
     
    Is it because there are some rules somewhere that require that a surrogate pair copied from a UTF16 sequence must first be combined to produce one codepoint and then that codepoint must be compressed, rather than that the two codepoints be each individually compressed?
     
    If so, do those rules necessarily apply to utf-c2? If so, would they apply if the format were denoted by a name that does not include the sequence utf?
     
    Would compressing the surrogate codes separately make the design of the format simpler?
     
    Could sequences starting 10000000 and 10000001 be used for switching codes?
     
    William Overington
     
    26 February 2011
     



    This archive was generated by hypermail 2.1.5 : Sat Feb 26 2011 - 03:22:39 CST