Re: UTF-8 stress test file?

From: Theodore H. Smith (delete@elfdata.com)
Date: Sun Oct 10 2004 - 15:59:25 CST

Next message: Theodore H. Smith: "Re: UTF-8 stress test"

Previous message: Simon Montagu: "Re: UTF-8 stress test file?"
In reply to: Simon Montagu: "Re: UTF-8 stress test file?"
Next in thread: Philippe Verdy: "Re: UTF-8 stress test file?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

>> I'd like to see a UTF-8 stress test file.
>> It should consist of lines of UTF-8, separated each by a newline.
>> Each line should be malformed. Also, some idea of how to deal with
>> the malformed UTF-8 should be noted in a separate file.
>> Really, I just want some way to verify that I can detect every kind
>> of UTF-8 wrongness. I have some code I adapted from Unicode.org, but
>> I want to make sure my adaptions haven't broken the code.
>
> http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt

"This file is not meant to be a conformance test. It does
not prescribes any particular outcome and therefore there is no way to
"pass" or "fail" this test file, even though the texts suggests a
preferable decoder behaviour at some places."

I'm wondering if Unicode.org has a proper conformance test? If not, I
suggest they make one. One where we had each test separated by a single
newline, and no non-ttest lines existing... less they wanted to make
some kind of "comment line" which is easy to parse (lets say starting
the line with "#").

For me to use that test programmatically, I'll need to break out my
non-UTF-8 aware text editor, delete all the non test lines, and then
separate out the good and the bad UTF8 into different files! That way I
can use readline type code to do my UTF-8 verification.

It would be nice if someone had a "automated test ready" UTF-8 file.

If not, I'll modify this one and then put the results up on my website,
someday. (week or so).

--
     Theodore H. Smith - Software Developer.
     http://www.elfdata.com

Next message: Theodore H. Smith: "Re: UTF-8 stress test"
Previous message: Simon Montagu: "Re: UTF-8 stress test file?"
In reply to: Simon Montagu: "Re: UTF-8 stress test file?"
Next in thread: Philippe Verdy: "Re: UTF-8 stress test file?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sun Oct 10 2004 - 16:01:28 CST