Re: Subject: Re: 32'nd bit & UTF-8

From: Hans Aberg (haberg@math.su.se)
Date: Wed Jan 19 2005 - 17:51:30 CST

Next message: Eric Muller: "Re: Forms for invisible ZWJ (and ZWNJ)"

Previous message: Hans Aberg: "Re: 32'nd bit & UTF-8"
Maybe in reply to: Arcane Jill: "Subject: Re: 32'nd bit & UTF-8"
Next in thread: Hans Aberg: "Re: Subject: Re: 32'nd bit & UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

At 21:09 +0100 2005/01/19, Marcin 'Qrczak' Kowalczyk wrote:
>> On the very contrary. It's most helpful to determine a text file's
>> encoding. Without the UTF8 BOM it's hard to tell whether a file is
>> encoded in some ISO or whatever encoding/codepage or is already UTF8.
>
>The problem with BOM in UTF8 is that it must be specially handled by
>all applications. It effectively turns UTF-8 into a stateful encoding
>where the beginning of a "text stream" must be treated specially.
>World would be simpler if UTF-8 BOM was banned.
>
>Fortunately I have never met a Unix program which used a UTF-8 BOM,
>so I can mostly ignore the issue, except that text files coming from
>Windows may have that annoying thing at the beginning which must be
>stripped.

The main point is that BOM will not be specially treated in the UNIX world,
regardless what Unicode says. So I guess MS does not want its text files to
be read in the UNIX world. Unicode has made the mistake of favoring a
special platform over all the others.

Hans Aberg

Next message: Eric Muller: "Re: Forms for invisible ZWJ (and ZWNJ)"
Previous message: Hans Aberg: "Re: 32'nd bit & UTF-8"
Maybe in reply to: Arcane Jill: "Subject: Re: 32'nd bit & UTF-8"
Next in thread: Hans Aberg: "Re: Subject: Re: 32'nd bit & UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Jan 19 2005 - 17:53:25 CST