2012-07-17 17:11, Leif Halvard Silli wrote:
>>> For instance, early on in 'the Web', some
>>> appeared to think that all non-ASCII had to be represented as entities.
>>
>> Yes indeed. There's still some such stuff around. It's mostly
>> unnecessary, but it doesn't hurt.
>
> Actually, above I described an example where it did hurt ...
The situation is comparable to the BOM issue. In the old days, it was 
considered (with good reasons presumably) safer to omit the BOM than to 
use it in UTF-8, and it was considered safer to use entity references 
rather than direct non-ASCII data. It has changed now, but people are 
conservative, and people read old warnings.
We should now say that BOM is not required in UTF-8, but it is safer to 
use it, unless you have good reasons not to use it (e.g., authoring 
environment that dislikes it). Similarly, character data should 
preferably be in UTF-8, unless you have good reasons (mostly on the 
authoring side, not clients) to avoid it an use entity and character 
references instead.
> I have discovered one browser where it does hurt more directly: In W3M,
> the text browser, which is also included in Emacs. W3M doesn't handle
> (all) entities. E.g. it renders å and å as an 'aa' instead
> of as an 'å', for instance.
To take a more modern example, the native e-mail client on my Android 
seems to systematically display character and entity references 
literally when displaying message headers with small excerpts of 
content, even though it correctly interprets them when displaying the 
message itself.
Yucca
Received on Tue Jul 17 2012 - 09:34:15 CDT
This archive was generated by hypermail 2.2.0 : Tue Jul 17 2012 - 09:34:16 CDT