Re: UTF-8 code in HTML

From: Rick McGowan (rmcgowan@apple.com)
Date: Tue Apr 11 2000 - 21:10:17 EDT

Next message: G. Adam Stanislav: "Re: UTF-8 code in HTML"
Previous message: Jonathan Coxhead: "Re: UTF-8 code in HTML"
Maybe in reply to: Fady Elias: "UTF-8 code in HTML"
Next in thread: G. Adam Stanislav: "Re: UTF-8 code in HTML"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> If I have 3 H T M L files side-by-side in a directory, one in U T F
> 8, another in, say, big-endian Unicode, and a third in shift-JIS,
> there is no way they can be self describing, because in order to
> parse the H T M L, you have to understand the encoding already.

HTML files are not just filled with completely unstructured data -- there is
a header, and it is supposed to be in some well-known format. Otherwise,
the situation devolves to precisely what we have today -- unstructured text
or data files filled with unmarked data in a variety of encodings.

If a format is to be "self describing" the header should be something like
ASCII, up to the point where one has enough information to know the encoding
of the rest of the document... I believe that's the case with HTML. Once
you have looked at the first part of the header, you should have discovered
the encoding and be able to parse the rest of the file (or know that it's
unparsable) without having to guess.

I really hope we don't start seeing lots of new file extensions... expecialy
if they're going to be limited to 3 or 4 letters and collide with everything
else that's 3 or 4 letters...

Rick

Next message: G. Adam Stanislav: "Re: UTF-8 code in HTML"
Previous message: Jonathan Coxhead: "Re: UTF-8 code in HTML"
Maybe in reply to: Fady Elias: "UTF-8 code in HTML"
Next in thread: G. Adam Stanislav: "Re: UTF-8 code in HTML"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:01 EDT