Security Issues
Q: I've heard claims that Unicode poses
security issues. Is that right?
A: A common security issue is 'spoofing', the deliberate
misspelling of a domain or user name to trick unaware users into
entering an interaction with a hostile site as if it was a trusted site.
To be effective, spoofing can be very approximate, e.g. using the digit
'1' instead of the letter 'l'. The Unicode Standard contains many
"confusables," that is, characters whose glyphs, due to historical
derivation or sheer coincidence, resemble each other more or less
closely. Certain security-sensitive applications or systems may be
vulnerable due to possible misinterpretation of these confusables by
their users. [AF] and
[DE]
Q: Is this a problem that is unique to
Unicode?
A: No, many legacy character sets, including ISO/IEC
8859-1, also contain confusables (albeit usually fewer of them) and
carry the same risks when it comes to spoofing.
[AF] and
[DE]
Q: Why is it not simply possible to give
all characters that use the same glyph a single code?
A: Unicode encodes characters, not glyphs. By unifying an
encoding based strictly on appearance, many common text processing tasks
would become convoluted or impossible. For example, Latin B and
Greek Beta (Β) look the same in most fonts, but lower-case to two
different letters, Latin b and Greek beta (β), which have very
distinct appearance. [AF] and
[DE]
Q: Where can I find out more about security
issues with Unicode?
A: See UTR
#36: Unicode Security Considerations
|