Re: Long-term archiving of electronic text documents from Jim Breen on 2013-01-28 (Unicode Mail List Archive)

From: Jim Breen <jimbreen_at_gmail.com>
Date: Tue, 29 Jan 2013 10:51:48 +1100

William_J_G Overington <wjgo_10009_at_btinternet.com> wrote:

> The idea is that there would be an additional UTF format, perhaps UTF-64,
> so that each character would be expressed in UTF-64 notation using 64 bits,
> thus providing error checking and correction facilities at a character level.

Error detection and correction at the character level is considered
very old-fashioned now. Modern techniques such as Reed-Solomon
codes[1] are much more effective and involve much less overhead
than the 100% in the proposal above. Such techniques are already
used in modern disc storage[2], and when combined with RAID
techniques[3] provide better data protection than character-level
redundancy ever would.

In any case, I think issues of error detection and correction are
quite outside the scope of Unicode.

Cheers

Jim

[1] http://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction
[2] http://en.wikipedia.org/wiki/Error_detection_and_correction#Data_storage
[3] http://en.wikipedia.org/wiki/RAID

-- 
Jim Breen
Adjunct Snr Research Fellow, Japanese Studies Centre, Monash University

Received on Mon Jan 28 2013 - 18:00:54 CST

This archive was generated by hypermail 2.2.0 : Mon Jan 28 2013 - 18:00:58 CST