The Unicode Consortium Discussion Forum

The Unicode Consortium Discussion Forum

 Forum Home  Unicode Home Page Code Charts Technical Reports FAQ Pages 
 
It is currently Tue Oct 21, 2014 11:49 pm

All times are UTC - 6 hours [ DST ]




Post new topic Reply to topic  [ 5 posts ] 
Author Message
 Post subject: New/extra character gets introduced on opening file in Unix
PostPosted: Mon Jun 18, 2012 2:14 pm 
Offline

Joined: Mon Jun 18, 2012 6:47 am
Posts: 2
I created a text file in Windows using notepad. It had only one character (¬). The ASCII code of this character is 172. I saved the file in UNICODE encoding. When I copied this file to a UNIX system and open it in vi editor, it shows 2 characters in the file (¬). The ASCII code of the newly introduced character is 194.

I do not want the new character to appear in Unix. Am I doing something wrong here or I am supposed to do things differently.

Please advise.

Thanks

Ravi


Top
 Profile  
 
 Post subject: Re: New/extra character gets introduced on opening file in U
PostPosted: Mon Jun 18, 2012 5:52 pm 
Offline
Unicode Guru

Joined: Tue Dec 01, 2009 2:49 pm
Posts: 189
What you are seeing is the UTF-8 form of Unicode, where this character has two bytes.

If your vi is set to work in Latin-1 instead of UTF-8, you would mistakenly see two "characters".

If you need to create files to work with that editor, you need to use Windows ANSI or 1252 code page, or, alternatively, you need to change your editor to accept UTF-8. I cannot tell you how to do that, because I don't own a Unix system.


Top
 Profile  
 
 Post subject: Re: New/extra character gets introduced on opening file in U
PostPosted: Mon Jun 18, 2012 6:50 pm 
Offline

Joined: Mon Feb 01, 2010 6:18 pm
Posts: 79
So basically, vi is interpreting the incoming file as ANSI, rather than UTF-8, so it runs an ANSI to UTF-8 conversion on an already UTF-8 stream. As such, the single character ¬, which is encoded in UTF-8 as 0xC2 0xAC is instead taken as the two characters U+00C2 and U+00AC. Unix and Notepad are known to not play well together (http://blogs.msdn.com/b/michkap/archive ... 57028.aspx). Try the freeware Notepad++ (http://notepad-plus-plus.org/), which allows you to explicitly specify your encoding form, Byte-Order-Mark, and End-of-Line preferences. While setting your vi settings can help with this particular problem, editing plain text in Notepad for Unix consumption is generally to be avoided on spec.


Top
 Profile  
 
 Post subject: Re: New/extra character gets introduced on opening file in U
PostPosted: Tue Jun 19, 2012 8:05 am 
Offline

Joined: Mon Jun 18, 2012 6:47 am
Posts: 2
Thank you for your inputs. I do not have control over the editor the team uses.

The character '¬' is used as a separator character in a SQL script file. But the moment someone edits this file on a editor in Unix, and saves back, the additional character gets introduced.

Specifically the problem occurs in vi and SQL plus in Unix.

Is there something I can instruct the Unix users that prevents the additional character from getting into the file?


Top
 Profile  
 
 Post subject: Re: New/extra character gets introduced on opening file in U
PostPosted: Tue Jun 19, 2012 12:23 pm 
Offline

Joined: Mon Feb 01, 2010 6:18 pm
Posts: 79
You need to tell them that they need to alter the settings of vi (like Asmus, I don't use it, so I don't know how to do this) so that it reads the input files as Unicode, not as ANSI. It is running a conversion on the files that it shouldn't be doing. The only other solution is to have a sanitizing app that users can run that replaces all occurrences of ¬ with ¬, and removes any occurrences of ÂÂ, which will happen if a file gets edited by vi, sent back to Notepad, edited there, and back to vi, as each  and ¬ will get expanded to  and ¬.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 5 posts ] 

All times are UTC - 6 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 2 guests


Quick-mod tools:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
Template made by DEVPPL.com