From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Aug 05 2003 - 18:42:52 EDT
Peter responded to Mark:
> On 05/08/2003 14:40, Mark Davis wrote:
>
> >Where did you get the notion that space is not a base character? And
> >base characters include those that are not control or format
> >characters. Space is neither one.
> >
> >The standard specifically states in a number of places that to exhibit
> >a combining mark in isolation you use a space (or NBSP).
> >
> >Mark
> >__________________________________
> >http://www.macchiato.com
> >► “Eppur si muove” ◄
> >
> >
> >
> I got this from the Unicode Standard 4.0, as quoted by Jim Allan:
*Mis*quoted by Jim Allan.
>
> > In http://www.unicode.org/book/preview/ch03.pdf the space characters
> > in general are given class Zs:
> >
> > << Zs, Zl, and Zp are considered format characters, but their
> > membership in the Z (separator) class takes precedence over their
> > membership in the Cf class, because the General Category assigns only
> > a single value to each character. >>
That piece of text is *NOT* a quotation from Chapter 3 of Unicode
4.0. Go to that URL and search for it yourself.
It is quoted from Chapter 4 of Unicode *3.0*, p. 88, in the
discussion of General Category in Section 4.5, "General Category --
Normative in Part". The corresponding paragraph has been deleted
from the relevant section in Unicode 4.0, precisely because the
standard now precisely defines format control characters as
{Cf, Zl, Zp} but *ex*cluding Zs. See p. 25 in:
http://www.unicode.org/book/preview/ch02.pdf
> >
> > So the various space characters (class Zs) are also classified as
> > format characters.
> >
> > From http://www.unicode.org/book/ch04.pdf:
> >
> > << _D13 Base character:_ a character that does not graphically
> > combine with preceding character, and that is neither control nor a
> > format character. >>
> >
> > Accordingly, by definition, spaces are not base characters.
This conclusion is false. As Mark indicated, SPACE (and NBSP) are
base characters, and have been treated as such in terms of
diacritic application since Unicode 1.0 was published:
"By convention, diacritical marks used by the Unicode encoding
scheme may be exhibited in (apparent) isolation by applying
them to U+0020 SPACE or to U+00A0 NON-BREAKING SPACE. This
might be done, for example, when talking about the diacritical
mark itself as a mark, rather than using it in its normal way
in text."
-- Unicode 1.0, p. 19 [1991]
And that *is* an accurate quote from the standard. In Unicode 4.0
that text survives as:
"By convention, diacritical marks used by the Unicode Standard
may be exhibited in (apparent) isolation by applying
them to U+0020 SPACE or to U+00A0 NON-BREAKING SPACE. This tactic
might be employed, for example, when talking about the diacritical
mark itself as a mark, rather than using it in its normal way
in text."
-- Unicode 4.0, p. 46 [2003]
I'd say the intent of the UTC and the Unicode Standard in this
regard has always been rather clear and has stayed
unchanged for quite some time.
--Ken
This archive was generated by hypermail 2.1.5 : Tue Aug 05 2003 - 19:25:33 EDT