Long-term archiving of electronic text documents

From: William_J_G Overington <wjgo_10009_at_btinternet.com>
Date: Mon, 28 Jan 2013 12:30:32 +0000 (GMT)

I was thinking about the problems of the long-term archiving of electronic text documents and thought of an idea.

I wonder if I may please mention the idea here in the hope of there being a discussion so that an assessment of whether the idea is worth developing can be made.

The idea is that there would be an additional UTF format, perhaps UTF-64, so that each character would be expressed in UTF-64 notation using 64 bits, thus providing error checking and correction facilities at a character level.

If such a UTF-64 format were established as part of the standard, then maybe in the future, for example, Microsoft WordPad could carry an option to save a text file as UTF-64.

At present, on the Windows xp system that I am using, when saving a text file from within Microsoft WordPad one of the choices of file type is listed as Unicode Text Document, which uses a UTF-16 format.

A document saved as UTF-64 may well take four times as many bytes as such a Unicode Text Document, yet there would be the error checking and correction facilities at a character level.

Similarly, there could be a type of pdf document where the text within the pdf document were stored in UTF-64 format.

So, I write to put forward the idea so as to seek opinions please on whether establishing such a UTF format, whether UTF-64 or some other size, with error checking and correction facilities at a character level would be useful.

William Overington

28 January 2013
Received on Mon Jan 28 2013 - 06:37:07 CST

This archive was generated by hypermail 2.2.0 : Mon Jan 28 2013 - 06:37:15 CST