"Martin J. Dürst", Tue, 17 Jul 2012 18:49:47 +0900:
> On 2012/07/17 17:22, Leif Halvard Silli wrote:
> 
>> And an argument was put forward in the WHATWG mailinglist
>> earlier tis year/end of previous year, that a page with strict ASCII
>> characters inside could still contain character entities/references for
>> characters outside ASCII.
> 
> Of course they can. That's the whole point of using numeric character 
> references. I'm rather surprised that this was even discussed in the 
> context of HTML5.
And the question was whether such a page should default to be seen as 
UTF-8 encoded.
>> For instance, early on in 'the Web', some
>> appeared to think that all non-ASCII had to be represented as entities.
> 
> Yes indeed. There's still some such stuff around. It's mostly 
> unnecessary, but it doesn't hurt.
Actually, above I described an example where it did hurt ... At least, 
if the goal is that pages are interpreted as UTF-8.
I have discovered one browser where it does hurt more directly: In W3M, 
the text browser, which is also included in Emacs. W3M doesn't handle 
(all) entities. E.g. it renders å and å as an 'aa' instead 
of as an 'å', for instance.
So it seems to me that it is always advantageous to type characters 
directly as doing so allows for better character encoding detection in 
case the encoding labels disappear (read: easier to pick up that the 
page is UTF-8 encoded) and also works better in at least one browser. 
It does, as well, make authors more aware of the entire encoding issue 
since it means that the page has to be properly labeled in order to 
work cross parsers.
-- Leif Halvard SilliReceived on Tue Jul 17 2012 - 09:18:08 CDT
This archive was generated by hypermail 2.2.0 : Tue Jul 17 2012 - 09:18:09 CDT