Re: Comment on PRI 98: IVD Adobe-Japan1 (pt.2)

From: Doug Ewell (dewell@adelphia.net)
Date: Mon Mar 26 2007 - 07:55:50 CST

Next message: James Tu: "Arabic and Adobe Flash"

Previous message: Philippe Verdy: "RE: Comment on PRI 98: IVD Adobe-Japan1 (pt.2)"
In reply to: Richard Wordingham: "Re: Comment on PRI 98: IVD Adobe-Japan1 (pt.2)"
Next in thread: Richard Wordingham: "Re: Comment on PRI 98: IVD Adobe-Japan1 (pt.2)"
Reply: Richard Wordingham: "Re: Comment on PRI 98: IVD Adobe-Japan1 (pt.2)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Richard Wordingham <richard dot wordingham at ntlworld dot com> wrote:

> It would be wrong for an application implicitly claiming not to change
> the text to strip variation selectors out of ideographic selectors
> without any by your leave. (By contrast, normalisation does not
> change the text for Unicode-compliant processes - some round-tripping
> is inherently not Unicode-compliant.)

This doesn't sound right to me. Normalization is all about changing one
character or sequence to another. A Unicode-compliant process is not
supposed to assume that two canonical-equivalent sequences will be
treated differently, but that is not the same as saying the text has not
changed -- especially if compatibility normalization (NFKC or NFKD) is
involved.

> On the other hand, it might not be unreasonable for an application to
> compress such text by transferring the information in the variation
> selectors to a 'higher level protocol'. For a file consisting mostly
> of CJK text, appending U+E0100 to every unified ideograph would bloat
> the UTF-16 storage requirement from typically one code unit per
> character to typically three code units per character! Doug Ewell's
> survey of Unicode compression ( http://www.unicode.org/notes/tn14/ )
> rather suggests that many standard compression techniques would not
> counteract such bloat effectively.

This is true for compression techniques that operate on one code point
at a time, such as SCSU and BOCU and Huffman coding. It may not be true
for dictionary-based techniques like LZ. The question of how desirable
it is to append a variation selector to every character in the first
place is perhaps more generally interesting.

--
Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14
http://users.adelphia.net/~dewell/
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages

Next message: James Tu: "Arabic and Adobe Flash"
Previous message: Philippe Verdy: "RE: Comment on PRI 98: IVD Adobe-Japan1 (pt.2)"
In reply to: Richard Wordingham: "Re: Comment on PRI 98: IVD Adobe-Japan1 (pt.2)"
Next in thread: Richard Wordingham: "Re: Comment on PRI 98: IVD Adobe-Japan1 (pt.2)"
Reply: Richard Wordingham: "Re: Comment on PRI 98: IVD Adobe-Japan1 (pt.2)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Mar 26 2007 - 07:58:12 CST