Corrigendum #9

Asmus Freytag asmusf at
Sat Jun 7 23:33:57 CDT 2014

On 6/7/2014 9:19 PM, Karl Williamson wrote:
> On 06/02/2014 11:00 AM, Shawn Steele wrote:
>> To further my understanding, can someone provide examples of how 
>> these are used in actual practice?  I can't think of any offhand and 
>> the closest I get is like the old escape characters to get a dot 
>> matrix printer to shift modes, or old word processor internal 
>> formatting sequences.
> Here's an example of a possible use.  20 some years ago I wrote a 
> front-end to the Unix diff utility.  Showing the differences between 
> files (usually 2 versions of the same program's code) is an extremely 
> common programming activity.  I do it many times a day.  One reason is 
> to try to find out why a bug has crept in.  In doing so, there are 
> some differences that are not relevant to the task at hand, and their 
> being shown is a significant distraction. For example, in programming, 
> one might have renamed a variable (identifier) because its purpose has 
> changed somewhat and the name should accurately reflect its new 
> function so the reader is not subconsciously misled.  It would be nice 
> to be able to suppress the variable name changes from the difference 
> display. There could be thousands of them.  By changing the name in 
> each file version to the same noncharacter during the diff, these 
> differences won't be displayed, and there would not be any possible 
> conflict with the input files having that noncharacter in them.  (For 
> display the noncharacter is changed back to the original value in its 
> respective file)  Further, one might want to ignore the name changes 
> of two variables.  Just use a second noncharacter, up to 66.
> I wrote this long before noncharacters were available.  What I do 
> instead is scan the files for rarely used characters until I find 
> enough ones that aren't in the files.  For example U+9F is unlikely to 
> appear.  Scanning the files takes time.  This step could be omitted 
> for noncharacters that are known to be illegal in the input.
This "illegal in the input" so "I'm free to assume I can use them for my 
purposes" was definitely the primary(!) design goal discussed when the 
set of 32 were added to Unicode. Having UTC backpedal from that, many 
years after original design, based on a single meeting and without 
public review is really a breakdown of the process.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the Unicode mailing list