Re: Security Issues

From: Peter Kirk (peterkirk@qaya.org)
Date: Fri Mar 25 2005 - 17:55:30 CST

  • Next message: Rick McGowan: "Re: 'lower case a' and 'script a' in unicode"

    On 25/03/2005 22:12, Erik van der Poel wrote:

    > Peter Kirk wrote:
    >
    >> I also note a wide range of which Unicode characters are used for the
    >> apostrophe in the various languages, but that is an issue for those
    >> who coded the texts (for some of them, it is me).
    >
    >
    > It may also become an issue for those writing the specs for IDNs and
    > maybe even the base spec, Stringprep. Would you please list the
    > codepoints of the apostrophes that you are aware of?
    >
    Several, most English language versions, used U+0027. Others used U+2019
    - although some used this also as a quotation mark paired with U+2018.
    The Greek text uses U+1FBD, although this is a coding error - a Greek
    apostrophe is not a KORONIS. None of the texts I looked at use U+02BC,
    although this is the character which is supposed to be used at least for
    the Azerbaijani apostrophe.

    It may help you to look at the lists of near equivalents listed in the
    Unicode code charts. These are mostly, not always, sufficiently similar
    in shape to be confusable, and so should potentially be folded together
    for these purposes. I would think it would also be sensible to fold all
    of U+02B9, U+02BB to U+02BF, U+2018, U+2019 and U+201B to U+0027, as all
    of these are easily confusable in small font sizes, and some software
    rather too freely converts between these as "smart quotes".

    Meanwhile I have found a language which appears to use a double quote as
    well as a single quote, also a word initial hyphen, as part of its
    orthography. See http://www.worldscriptures.org/pages/attie.html. Double
    quotes need to be treated with a similar set of lookalike equivalents to
    single quotes, also equivalenced to two single quotes to avoid spoofing.

    There may be other useful information on published orthographies at this
    site, http://www.worldscriptures.org/a-z-frameset.html, although there
    is sadly no sample of Fe'fe'.

    -- 
    Peter Kirk
    peter@qaya.org (personal)
    peterkirk@qaya.org (work)
    http://www.qaya.org/
    -- 
    No virus found in this outgoing message.
    Checked by AVG Anti-Virus.
    Version: 7.0.308 / Virus Database: 266.8.1 - Release Date: 23/03/2005
    


    This archive was generated by hypermail 2.1.5 : Fri Mar 25 2005 - 17:56:16 CST