Processing of default ignorable code points

From: Peter Kirk (peterkirk@qaya.org)
Date: Thu Aug 05 2004 - 13:11:55 CDT

  • Next message: Peter Constable: "list etiquette (was RE: [hebrew] Re: Holam background document)"

    In TUS 4.0 Section 5.3, p.111, the following is stated of default
    ignorable code points:

    > These characters are also ignored except with respect to specific,
    > defined processes; for example, ZERO WIDTH NON-JOINER is ignored in
    > collation. ... For more information, see Section 5.20, Default
    > Ignorable Code Points.

    But in Section 5.20, although there is a lot about rendering default
    ignorable code points, there is no further information about any other
    processing of them. The implication of that section seems to be that
    these characters are intended to be ignored in rendering but not in
    other processes such as collation. Is this or the summary in Section 5.3
    in fact to be taken as the intention of the standard? Has the summary
    simply not been updated for consistency with the fuller details? Or has
    the fuller description been unintentionally restricted to rendering?

    Is it in fact the intention that all default ignorable characters must
    always be ignored in collation? Or is it possible to tailor collation
    not to ignore them? The collation algorithm seems to suggest the latter,
    in that there seems to be no mention of these characters being
    obligatorily ignored - although I presume they have zero weight by
    default (in DUCET).

    This has some quite serious implication for processing of texts
    including ZW(N)J, variation selectors etc.

    -- 
    Peter Kirk
    peter@qaya.org (personal)
    peterkirk@qaya.org (work)
    http://www.qaya.org/
    


    This archive was generated by hypermail 2.1.5 : Thu Aug 05 2004 - 13:13:35 CDT