Unicode and Security
I would like to start a series of discussion about
the security aspects of Unicode.
I would also like to know your opinion about the
need to create another or an 'intermediate' standard.
I have a lot of issues in my mind - Security is
the top one.
With the introduction of digital signatures security
will became a very important part of the character
encoding.
Is Unicode secure? What character standards can be
considered secure?
I had the following problems where unicode could not
be used because of security issues. In all cases
the signer of a document can be lured into
believing that the wording of the document he/she
is about to sign is different.
How can it be? I had the following problems:
1. Character Order Problem
The BIDI algorithm is too complex and not reversible.
I could create a BIDI document where only RLO LRO and
PDF characters were used, and the WORD, JAVA and KDE
produced different word ordering. I don't have access
to MS platform now to reproduce this but as far as
I can tell it was like:
<RLO>text1<PDF>U+0020<RLO>text2<PDF>
Because the BIDI algorithm is too complex and vague
it can be said that these programs all displayed
the text correctly, still differently.
text1 text2
text2 text1
2. Character Shape Problem
I had different character shapes, because:
a) Ligatures
In complex scripts, in Devanagari for instance the
ZERO WITH JOINER should be used to prevent ligature
forming and normally join the characters.
Whether ligature forming will actually happen or not
is completely up to the font. If the font does have
the ligature, it will be formed. The standard does
not define all the compulsory ligatures.
I was even thinking about putting ZERO WITH JOINER
after each character. But why we have ZERO WITH JOINER
at all? I think a ZERO WITH LIGATURE FORMER would
be better. In this case at least I would know that
a ligature may appear at that point.
b) Hidden Marks
It is possible to make a combining mark, like a
negation mark appear in the base characters body
making it invisible. It is nearly impossible to
test the rendering engine for all possible
combinations.
3. Text Search Problem
It is possible to create texts that look the same,
but the can not be searched because even when fully
decomposed and ordered they will be different.
I am sure this is not a full list, but these are the things
that concern me most at the moment.
Thank you for you attention
Gaspar
This archive was generated by hypermail 2.1.2 : Sat Feb 02 2002 - 21:25:50 EST