Asmus Freytag wrote,
> A validator *should* look between the > and < in order to
> catch invalid entity references, esp. invalu NCRs.
>
> For UTF-8, it would ideally also check that no ill-formed,
> and therefore illegal, sequences are part of the UTF-8.
You've made a good point about invalid NCRs or named entities.
But, I think it's up to the author to proofread the actual text
in an appropriate application.
Is the HTML validator going to also be expected to check for
grammar, spelling, and use of punctuation?
There is so much text on the web using many different
encoding methods. Big-5, Shift-JIS, and similar encodings
are fairly well standardised and supported. Now, in addition
to UTF-8, a web page might be in UTF-16 or perhaps even
UTF-32, eventually. Plus, there's a plethora of non-standard
encodings in common use today. An HTML validator should
validate the mark-up, assuring an author that (s)he hasn't
done anything incredibly dumb like having two </title>
tags appearing consecutively. Really, this is all that we should
expect from an HTML validator. Extra features such as
checking for invalid UTF-8 sequences would probably be most
welcome, but there are other tools for doing this which an
author should already be using.
Best regards,
James Kass.
This archive was generated by hypermail 2.1.2 : Fri Dec 14 2001 - 14:35:17 EST