From: Jill Ramonsky (Jill.Ramonsky@aculab.com)
Date: Tue Oct 21 2003 - 06:05:09 CST
Interesting.
I do strongly suspect, however, that at least part of the reason that LS 
and PS didn't take off was that they are more than seven bits wide, and 
hence cannot be transported in plain ASCII text.
I wonder why it was not felt a good idea at the time (the early 1990s) 
to have defined LS and PS, but with codepoints somewhere in the range 
U+00 to U+1F. I think it would have been fairly easy to find some mostly 
unused ones, for example U+10 and U+11. The reason? SMTP traffic is (by 
definition) transmitted across 7-bit-wide channels. HTTP traffic is 
transmitted across 8-bit wide channels. In the internet world, "newline" 
is CRLF, and everything else has to be converted to it for transmission 
across the internet.
Personally, I would have added a THIRD kind of separator, a "soft line 
break". The reason? Some email relays insist on a "maximum line length" 
of emails. In these days of mime types and attachments, we inject CRLF 
into the files to keep such relays happy, but the renderer ignores them 
as "just whitespace". If we'd have had a "soft line break" character (in 
the range U+00 to U+1F), we could have retrofitted it into existing 
email protocols. Had we done this, SLB could have been considered "just 
whitespace", while LS and PS would have been not-ignorable in HTML (and 
in fact, equivalent to <br> and <p> respectively).
I'm not surprised that NEL never caught on though.
Jill
 > -----Original Message-----
 > From: Frank da Cruz [mailto:fdc@columbia.edu]
 > Sent: Monday, October 20, 2003 4:53 PM
 > To: Jill Ramonsky
 > Cc: unicode@unicode.org
 > Subject: Re: Line Separator and Paragraph Separator
 >
 >
 > At some point in the early 1990s, the thinking was that ASCII control
 > characters were included in Unicode only for round-trip compatibility
 > with existing character sets, but their semantics were
 > undefined, and anyway
 > they were not needed since they were from the bygone days of
 > terminals and
 > similar antique contraptions, whereas in modern times all
 > text is "flowed"
 > by "smart rendering engines".
 >
 > Ten years hence, the terminal-to-host model is still widely
 > used, as is text
 > with hard line breaks, but to convince the skeptics and
 > ultra-modernists
 > that line breaks were still a useful concept, I mentioned
 > line-oriented
 > programming languages (such as Fortran), and poetry.  Hence the line
 > separator.
 >
 > Later everybody realized you couldn't stamp out ASCII control
 > characters,
 > so we're still using them; LS and PS never caught on as far as I know.
 > Although obviously, LS would have been an improvement over
 > the existing
 > situation, in which different line separators (CR, LF, CRLF) are used
 > on different platforms, which would otherwise have compatible text
 > record formats, which to this day causes no end of confusion.
 >
 > At some point after Unicode 2.0, the C1 controls were adopted
 > from ISO 6429,
 > in which we have a Next Line control (NEL, U+0085), which
 > might also have
 > served the purpose, but it never caught on either.
 >
 > - Frank
 >
This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST