Re: Nicest UTF

From: Marcin 'Qrczak' Kowalczyk (qrczak@knm.org.pl)
Date: Sun Dec 12 2004 - 06:09:11 CST

  • Next message: Marcin 'Qrczak' Kowalczyk: "Re: Roundtripping in Unicode"

    "Philippe Verdy" <verdy_p@wanadoo.fr> writes:

    > It's hard to create a general model that will work for all scripts
    > encoded in Unicode. There are too many differences. So Unicode just
    > appears to standardize a higher level of processing with combining
    > sequences and normalization forms that are better approaching the
    > linguistic and semantic of the scripts. Consider this level as an
    > intermediate tool that will help simplify the identification of
    > processing units.

    While rendering and user input may use evolving rules with complex
    specifications and implementations which depend on the environment
    and user's configuration (actually there is no other choice: this
    is inherently complicated for some scripts), string processing in
    a programming language should have a stable base with well-defined
    and easy to remember semantics which doesn't depend on too many
    settable preferences and version variations.

    The more complex rules a protocol demands (case-insensitive
    programming language identifiers, compared after normalization,
    after bidi processing, with soft hyphens removed etc.), the more
    tools will implement it incorrectly. Usually with subtle errors
    which don't manifest until someone tries to process an unusual name
    (e.g. documentation generation tool will produce hyperlinks with
    dangling links, because a WWW server does not perform sufficient
    transformations of addresses).

    -- 
       __("<         Marcin Kowalczyk
       \__/       qrczak@knm.org.pl
        ^^     http://qrnik.knm.org.pl/~qrczak/
    


    This archive was generated by hypermail 2.1.5 : Sun Dec 12 2004 - 06:11:42 CST