In message <9907051432.AA29431@unicode.org>
Peter_Constable@sil.org wrote:
>
>
> >I find myself dealing with Unicode text created by Windows and Windows
> applications quite frequently now, with line ends marked in little-endian
> fashion as
>
> 0D 00 0A 00
>
> Indeed, this practice has surprised me.
>
> Chris Pratley: can you comment on why Word 97 does this rather than using
> PS?
>
I think I can partially answer this from experience on our (non-MS)
environment. Our system continues to use our native line-ending type (LF
only) when dealing with Unicode data, for compatibility. In particular, when
converted to UTF-8, which is how Unicode is normally passed around our OS,
the data will have standard looking line endings - if PS or LS were used,
many non-UTF-8 aware parts of the system would get confused.
Also, a lot of Unicode data is converted from non-Unicode sources -
conversion will almost always leave C0 and C1 characters untouched. Changing
to PS and LS would need knowledge of the source data's line ending
conventions, which is hard to determine automatically. If you also need
round-trip conversion (eg Shift-JIS data in an HTML form -> Unicode browser
workings -> Shift-JIS submission to server), messing with line endings is
almost out of the question.
All other encodings use C0 controls for line endings - it's hard to
make a change for one particular encoding that does it differently.
-- Kevin Bracey, Senior Software Engineer Pace Micro Technology plc Tel: +44 (0) 1223 725228 645 Newmarket Road Fax: +44 (0) 1223 725328 Cambridge, CB5 8PB, United Kingdom WWW: http://www.acorn.co.uk/
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:48 EDT