Re: CRLF vs. LF (was Re: Unicode and end users)

From: Doug Ewell (dewell@adelphia.net)
Date: Wed Feb 27 2002 - 01:05:25 EST


Markus Scherer <markus.scherer@jtcsv.com> wrote:

> Why does UniPad not support NL and PS?

I don't work for Sharmahd, so the following is speculation.

Despite what UAX #13 says, I don't know of any editor or other text tool
that handles U+0085 as a newline character. The big debate has always
been between CR, LF, and CRLF. Maybe some of the IBM cross-platform tools
observe NL.

PS is an interesting situation. As UAX #13 points out, paragraph
separation can be implemented in many different ways. I leave a blank
line between the things I call paragraphs, but somone like Jim Agenbroad
might not. And not every blank line I leave is really a paragraph break.

Paragraph breaking implies that line breaking is also performed, and that
the two are different somehow. LS and PS probably should not be treated
as synonyms.

Since UniPad doesn't wrap words automatically, but relies on actual line
separators, the question is not "Why does UniPad not support PS?" but
rather "What does it *mean* for a text editor such as UniPad to support
PS?" Should it render two line breaks, like I do in ASCII with my CRLF
CRLF, or should it perform a single line break (matching Agenbroad's
style)? Should it handle any issues besides rendering in any way?
Remember, it's just an editor, so "paragraph semantics" are not relevant.

> I am aware of plain text files generated on mainframes (EBCDIC-based
machines, 390/400/iSeries/zSeries) that use NL (U+0085) instead of LF
(U+000a).
> As far as I know, this is sometimes because someone creates plain text
files on OS/390 Unix System Services, where the EBCDIC LF/NL codes are
swapped, and then uses a standard (non-swapped) mapping table to convert
this to Unicode.

Sounds to me like a good old-fashioned, time-honored conversion bug.

-Doug Ewell
 Fullerton, California



This archive was generated by hypermail 2.1.2 : Wed Feb 27 2002 - 01:36:25 EST