Re: UTF-8 stress test file?

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Oct 12 2004 - 12:25:16 CST

  • Next message: Christopher Fynn: "Re: bit notation in ISO-8859-x is wrong"

    From: "Doug Ewell" <dewell@adelphia.net>
    > Theodore H. Smith <delete at elfdata dot com> wrote:
    >
    >>> - the file mixes UTF-8 and UTF-16
    >>
    >> Does this file mix UTF-8 and UTF-16? I thought it just had surrogates
    >> encoded into UTF-8? Of course a surrogate should never exist in UTF-8.
    >
    > You are right. Philippe's statement was incorrect, and also puzzling.

    Have you read the file content? It clearly and explicitly speaks about
    UTF-16, which has nothing to do in a text file for UTF-8, unless the file
    was used as a test for CESU-8 (which is not UTF-16 as well, and not even
    UTF-8). My statement was correct: it is based on the fact that the test file
    was created for the older (RFC version) of UTF-8 used in old versions of ISO
    10646, and never referenced (at least explicitly until the v4.01
    clarification) by Unicode in any version.



    This archive was generated by hypermail 2.1.5 : Tue Oct 12 2004 - 12:30:00 CST