From: Doug Ewell (dewell@roadrunner.com)
Date: Thu Sep 06 2007 - 23:20:03 CDT
I'll see if I can find the thread where we talked about that, years ago.
Somebody wanted to build that capability into an extension to UTF-8, so
it could faithfully represent invalid garbage. We were never able to
get him to work through what he wanted to do with the garbage thus
preserved.
-- Doug Ewell · Fullerton, California, USA · RFC 4645 · UTN #14 http://users.adelphia.net/~dewell/ http://www1.ietf.org/html.charters/ltru-charter.html http://www.alvestrand.no/mailman/listinfo/ietf-languages ----- Original Message ----- From: Mark Davis To: steve.bush@neosys.com ; ICU support mailing list ; Unicode Sent: Thursday, September 6, 2007 12:34 Subject: Re: [icu-support] complete binary/utf mapping Ccing Unicode in case anyone knows. I don't know of any public ones. Years ago in ICU we tossed around the idea of having something like that. It was roughly the following: Reserve 256 code points for "bytes that couldn't be converted" Reserve one code point for a "quote character" When converting from a source, say possibly mangled UTF-8, convert all valid sequences normally, except that a quote character is inserted before any of the 256 items above. Any invalid sequence is converted to a sequence of the appropriate ones of the 256 code points. When converting back, the quote character + following code point is converted directly, and any other of the 256 are emitted as bytes. (The 257 code points could be private use.) This would round-trip all bytes in a buffer between any single charset X and Unicode. However, as soon as you get into a situation where you could be outputting the resulting Unicode to a different charset Y, then it looked like it started to break down. So it was little more than lunch conversation. Mark On 9/6/07, Steve Bush <Steve.Bush@neosys.com> wrote: I read somewhere that there were some proposals to work out a lossless scheme for round tripping binary (ie all illegal UTF bytes/sequences) to UTF and back again. Can anyone point me in the direction of these efforts? Steve Bush NEOSYS Dubai. ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ icu-support mailing list - icu-support@lists.sourceforge.net To Un/Subscribe: https://lists.sourceforge.net/lists/listinfo/icu-support -- Mark
This archive was generated by hypermail 2.1.5 : Thu Sep 06 2007 - 23:25:36 CDT