I was thinking about the problems of the long-term archiving of electronic text documents and thought of an idea.
I wonder if I may please mention the idea here in the hope of there being a discussion so that an assessment of whether the idea is worth developing can be made.
The idea is that there would be an additional UTF format, perhaps UTF-64, so that each character would be expressed in UTF-64 notation using 64 bits, thus providing error checking and correction facilities at a character level.
If such a UTF-64 format were established as part of the standard, then maybe in the future, for example, Microsoft WordPad could carry an option to save a text file as UTF-64.
At present, on the Windows xp system that I am using, when saving a text file from within Microsoft WordPad one of the choices of file type is listed as Unicode Text Document, which uses a UTF-16 format.
A document saved as UTF-64 may well take four times as many bytes as such a Unicode Text Document, yet there would be the error checking and correction facilities at a character level.
Similarly, there could be a type of pdf document where the text within the pdf document were stored in UTF-64 format.
So, I write to put forward the idea so as to seek opinions please on whether establishing such a UTF format, whether UTF-64 or some other size, with error checking and correction facilities at a character level would be useful.
William Overington
28 January 2013
Received on Mon Jan 28 2013 - 06:37:07 CST
This archive was generated by hypermail 2.2.0 : Mon Jan 28 2013 - 06:37:15 CST