Re: Filtering and displaying untrusted UTF-8

From: Kent Karlsson (
Date: Fri Jan 01 2010 - 08:19:23 CST

  • Next message: verdy_p: "Re: HTML5 encodings"

    Maybe someone has mentioned this already; but in case it hasn't been...

    Filtering out (= removing) anything poses a security risk in itself,
    if the "filtering" is done after the security check and before
    possibly sensitive script code execution ("script" here used in
    the sense of "executable program code"). E.g. "sensBLUBBERitive";
    not matching "sensitive" it is allowed through; filter away the
    "BLUBBER", and you get "sensitive"... Replacing (including the
    popular mapping to "?") can be dangerous too.

    It appears to me that many security issues related to character
    encoding is based on the premise that the security check is done
    *before* encoding conversion (presumably to Unicode) or other
    modification (normalisation, filtering, ...), and (sensitive) code
    execution is done after the conversion/modification. That seems
    a bit strange to me (the security check should be done after the
    conversion/modifications), but I guess there are coding expedience
    reasons, at least sometimes, for not having the security check
    after the conversion/modification(s).

        /kent k

    Den 2009-12-31 04.08, skrev "Petr Tomasek" <>:

    >> * 0xE000 - 0xF900 (private use; since everyone can make up a
    >> different character for a code point in private use, filter them all)
    > This is very bad idea since it efectively blocks people using other
    > chars that those defined in the unicode standard. (BTW, microsoft
    > and others have their own PUA assignements...)
    > P.T.

    This archive was generated by hypermail 2.1.5 : Fri Jan 01 2010 - 08:24:13 CST