Re: Display of Isolated Nonspacing Marks (was Re: Questions on ZWNBS...)

From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Aug 05 2003 - 18:42:52 EDT

  • Next message: Ted Hopp: "Re: Display of Isolated Nonspacing Marks (was Re: Questions on ZWNBS...)"

    Peter responded to Mark:

    > On 05/08/2003 14:40, Mark Davis wrote:
    >
    > >Where did you get the notion that space is not a base character? And
    > >base characters include those that are not control or format
    > >characters. Space is neither one.
    > >
    > >The standard specifically states in a number of places that to exhibit
    > >a combining mark in isolation you use a space (or NBSP).
    > >
    > >Mark
    > >__________________________________
    > >http://www.macchiato.com
    > >► “Eppur si muove” ◄
    > >
    > >
    > >
    > I got this from the Unicode Standard 4.0, as quoted by Jim Allan:

    *Mis*quoted by Jim Allan.

    >
    > > In http://www.unicode.org/book/preview/ch03.pdf the space characters
    > > in general are given class Zs:
    > >
    > > << Zs, Zl, and Zp are considered format characters, but their
    > > membership in the Z (separator) class takes precedence over their
    > > membership in the Cf class, because the General Category assigns only
    > > a single value to each character. >>

    That piece of text is *NOT* a quotation from Chapter 3 of Unicode
    4.0. Go to that URL and search for it yourself.

    It is quoted from Chapter 4 of Unicode *3.0*, p. 88, in the
    discussion of General Category in Section 4.5, "General Category --
    Normative in Part". The corresponding paragraph has been deleted
    from the relevant section in Unicode 4.0, precisely because the
    standard now precisely defines format control characters as
    {Cf, Zl, Zp} but *ex*cluding Zs. See p. 25 in:

    http://www.unicode.org/book/preview/ch02.pdf

    > >
    > > So the various space characters (class Zs) are also classified as
    > > format characters.
    > >
    > > From http://www.unicode.org/book/ch04.pdf:
    > >
    > > << _D13 Base character:_ a character that does not graphically
    > > combine with preceding character, and that is neither control nor a
    > > format character. >>
    > >
    > > Accordingly, by definition, spaces are not base characters.

    This conclusion is false. As Mark indicated, SPACE (and NBSP) are
    base characters, and have been treated as such in terms of
    diacritic application since Unicode 1.0 was published:

    "By convention, diacritical marks used by the Unicode encoding
    scheme may be exhibited in (apparent) isolation by applying
    them to U+0020 SPACE or to U+00A0 NON-BREAKING SPACE. This
    might be done, for example, when talking about the diacritical
    mark itself as a mark, rather than using it in its normal way
    in text."
                     -- Unicode 1.0, p. 19 [1991]
                     
    And that *is* an accurate quote from the standard. In Unicode 4.0
    that text survives as:

    "By convention, diacritical marks used by the Unicode Standard
    may be exhibited in (apparent) isolation by applying
    them to U+0020 SPACE or to U+00A0 NON-BREAKING SPACE. This tactic
    might be employed, for example, when talking about the diacritical
    mark itself as a mark, rather than using it in its normal way
    in text."
                     -- Unicode 4.0, p. 46 [2003]

    I'd say the intent of the UTC and the Unicode Standard in this
    regard has always been rather clear and has stayed
    unchanged for quite some time.

    --Ken



    This archive was generated by hypermail 2.1.5 : Tue Aug 05 2003 - 19:25:33 EDT