Re: Proposed Draft UTR #31 - Syntax Characters

From: Jim Allan (jallan@smrtytrek.com)
Date: Thu Aug 21 2003 - 16:26:34 EDT

  • Next message: Anto'nio Martins-Tuva'lkin: "Re: [Way OT] Beer measurements"

    Ben Dougall posted:

    > i'd say wide. narrow means not incorporating some characters that would
    > naturally fit into 'white space'. if i was parsing some text i'd
    > consider a non-breaking space white space and i'd expect my code to
    > reflect that. why would you not want your code to treat a non-breaking
    > space or mathematical space not as white space?

    Traditionally in c NBSP was not counted as white space. See
    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vccelng/htm/eleme_2.asp
    for one reference.

    This may have been accidental, as c white space properties were defined
      with only the 7-bit ASCII character set in mind.

    But it would break current c programs if NBSP were defined as white
    space. Logically then, if we exclude NBSP, other "hard" spaces should
    also not be defined as white space.

    Essentially NBSP was treated by many word processors and text editors as
    simply a printing character, like any other printing character, with no
    special "spacing" properties. It was only an imitation of a space in
    appearance. Undefined characters in fonts might also appear as
    imitiations of space in many printing systems. That did not make them
    white space.

    Of course under Unicode specifications NBSP is expect to expand like
    SPACE for justification and so assumes some of the attributes of SPACE.

    For compatility I think it best to not include any of the non-breaking
    spaces as white space.

    Jim Allan



    This archive was generated by hypermail 2.1.5 : Thu Aug 21 2003 - 17:35:58 EDT