Re: Clean and Unicode compliance

From: James Kass (jameskass@worldnet.att.net)
Date: Sun Dec 16 2001 - 23:50:24 EST

Previous message: James Kass: "Re: Plane One use, was Re: HTML Validation"
In reply to: Martin Duerst: "Re: Clean and Unicode compliance"
Next in thread: Martin Duerst: "Re: Clean and Unicode compliance"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Martin Duerst wrote,

> As the person who implemented UTF-8 checking for http://validator.w3.org,
> I beg to disagree. In order to validate correctly, the validator has
> to make sure it correctly interprets the incomming byte sequence as
> a sequence of characters. For this, it has to know the character
> encoding. As an example, there are many files in iso-2022-jp or
> shift_jis that are prefectly valid as such, but will get rejected
> by some tools because they contain bytes that correspond to '<' in
> ASCII as part of a doublebyte character.
>

Excellent example. Use of less-than bracket bytes in certain
encoding methods hadn't occurred to me.

HTML validators need to be aware of the encoding used in the
file. Based on your comments and other comments in this thread,
I concede the point. A validator should validate that the plain
text portion of an HTML file is properly encoded/well formed.

Best regards,

James Kass.

Previous message: James Kass: "Re: Plane One use, was Re: HTML Validation"
In reply to: Martin Duerst: "Re: Clean and Unicode compliance"
Next in thread: Martin Duerst: "Re: Clean and Unicode compliance"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Sun Dec 16 2001 - 22:36:32 EST