Unicode and Security

From: Gaspar Sinai (gsinai@yudit.org)
Date: Sat Feb 02 2002 - 21:41:11 EST


Unicode and Security

I would like to start a series of discussion about
the security aspects of Unicode.

I would also like to know your opinion about the
need to create another or an 'intermediate' standard.

I have a lot of issues in my mind - Security is
the top one.

With the introduction of digital signatures security
will became a very important part of the character
encoding.

Is Unicode secure? What character standards can be
considered secure?

I had the following problems where unicode could not
be used because of security issues. In all cases
the signer of a document can be lured into
believing that the wording of the document he/she
is about to sign is different.

How can it be? I had the following problems:

1. Character Order Problem

   The BIDI algorithm is too complex and not reversible.
   I could create a BIDI document where only RLO LRO and
   PDF characters were used, and the WORD, JAVA and KDE
   produced different word ordering. I don't have access
   to MS platform now to reproduce this but as far as
   I can tell it was like:

    <RLO>text1<PDF>U+0020<RLO>text2<PDF>

   Because the BIDI algorithm is too complex and vague
   it can be said that these programs all displayed
   the text correctly, still differently.

      text1 text2
      text2 text1

2. Character Shape Problem

   I had different character shapes, because:
   a) Ligatures
      In complex scripts, in Devanagari for instance the
      ZERO WITH JOINER should be used to prevent ligature
      forming and normally join the characters.

      Whether ligature forming will actually happen or not
      is completely up to the font. If the font does have
      the ligature, it will be formed. The standard does
      not define all the compulsory ligatures.

      I was even thinking about putting ZERO WITH JOINER
      after each character. But why we have ZERO WITH JOINER
      at all? I think a ZERO WITH LIGATURE FORMER would
      be better. In this case at least I would know that
      a ligature may appear at that point.

    b) Hidden Marks
      It is possible to make a combining mark, like a
      negation mark appear in the base characters body
      making it invisible. It is nearly impossible to
      test the rendering engine for all possible
      combinations.

3. Text Search Problem

    It is possible to create texts that look the same,
    but the can not be searched because even when fully
    decomposed and ordered they will be different.

I am sure this is not a full list, but these are the things
that concern me most at the moment.

Thank you for you attention
Gaspar



This archive was generated by hypermail 2.1.2 : Sat Feb 02 2002 - 21:25:50 EST