Re: CRLF vs. LF (was Re: Unicode and end users)

From: Markus Scherer (markus.scherer@jtcsv.com)
Date: Tue Feb 26 2002 - 17:34:48 EST


Doug Ewell wrote:

> SC UniPad can read and write text files:
> - using LF, CR, CRLF, or LS (U+2028);

Great, and I know about UniPad, but more people have Windows Notepad and other system-level editors.
Why does UniPad not support NL and PS?

> One thing it cannot do is maintain different line separators in a single
> file. It converts them all internally to U+2028 and writes them out
> consistently according to user preference. (I don't know why one would
> want different line separators in a single file, but maybe someone can
> think of a reason.)

I can't - this behavior is fine as far as I am concerned.

> Markus, when you say "NL" do you mean U+0085? What text files use this
> convention?

I am aware of plain text files generated on mainframes (EBCDIC-based machines, 390/400/iSeries/zSeries) that use NL (U+0085) instead of LF (U+000a).
As far as I know, this is sometimes because someone creates plain text files on OS/390 Unix System Services, where the EBCDIC LF/NL codes are swapped, and then uses a standard (non-swapped) mapping table to convert this to Unicode.
Happens inside XML parsers, for example, because the CCSID does not include a specification of which LF/NL codes are used...

Best regards,
markus



This archive was generated by hypermail 2.1.2 : Tue Feb 26 2002 - 18:00:44 EST