Re: NNBSP

From: Marcel Schneider via Unicode <unicode_at_unicode.org>
Date: Thu, 17 Jan 2019 12:31:29 +0100

On 17/01/2019 09:58, Richard Wordingham wrote:
>
> On Thu, 17 Jan 2019 04:51:57 +0100
> Marcel Schneider via Unicode <unicode_at_unicode.org> wrote:
>
>> Also, at least one French typographer was extremely upset
>> about Unicode not gathering feedback from typographers.
>> That blame is partly wrong since at least one typographer
>> was and still is present in WG2, and even if not being a
>> Frenchman (but knowing French), as an Anglophone he might
>> have been aware of the most outstanding use case of NNBSP
>> with English (both British and American) quotation marks
>> when a nested quotation starts or ends a quotation, where
>> _‘ ”_ or _“ ’_ and _’ ”_ or _” ’_ are preferred over the
>> unspaced compounds (_‘”_ or _“’_ and _’”_ or _”’_), at
>> least with proportional fonts.
>
> There's an alternative view that these rules should be captured by the
> font and avoid the need for a spacing character. There is an example
> in the OpenType documentation of the GPOS table where punctuation
> characters are moved rightwards for French.

Thanks, I didn’t know that this is already implemented. Sometimes one can
read in discussions that the issue is dismissed to font level. That looked
always utopistic to me, the more as people are trained to type spaces when
bringing in former typewriting expertise, and I always believed that it’s
a way for helpless keyboard layout designers to hand the job over.

Turns out there is more to it. But the high-end solution notwithstanding,
the use of an extra space character is recommended practice:

https://www.businesswritingblog.com/business_writing/2014/02/rules-for-single-quotation-marks.html

The source sums up in an overview: “_The Associated Press Stylebook_
recommends a thin space, whereas _The Gregg Reference Manual_ promotes a
full space between the quotation marks. _The Chicago Manual of Style_ says
no space is necessary but adds that a space or a thin space can be inserted
as ‘a typographical nicety.’ ” The author cites three other manuals for not
having retrieved any locus about the topic in them.

We note that all three style guides seem completely unconcerned with
non-breakability. Not so the author of the blog post: “[…] If your software
moves the double quotation mark to the next line of type, use a nonbreaking
space between the two marks to keep them together.” Certainly she would
recommend using a NARROW NO-BREAK SPACE if only we had it on the keyboard
or if the software provided a handy shortcut by default.

>
> This alternative conception hits the problem that mass market Microsoft
> products don't select font behaviour by language, unlike LibreOffice
> and Firefox. (The downside is that automatic font selection may then
> favour a font that declares support for the language, which gets silly
> when most fonts only support that language and don't declare support.)

Another drawback is that most environments don’t provide OpenType support,
and that the whole scheme depends on language tags that could easily got
lost, and that the issue as being particular to French would quickly boil
down to dismiss support as not cost-effective, arguing that *if* some
individual locale has special requirements for punctuation layout, its
writers are welcome to pick an appropriate space from the UCS and key it
in as desired.

The same is also observed about Mongolian. Today, the preferred approach
for appending suffixes is to encode a Mongolian Suffix Connector to make
sure the renderer will use correct shaping, and to leave the space to the
writer’s discretion. That looks indeed much better than to impose a hard
space that unveiled itself as cumbersome in practice, and that is
reported to often get in the way of a usable text layout.

The problems related to NNBSP as encountered in Mongolian are completely
absent when NNBSP is used with French punctuation or as the regular
group separator in numbers. Hence I’m sure that everybody on this List
agrees in discouraging changes made to the character properties of NNBSP,
such as switching the line breaking class (as "GL" is non-tailorable), or
changing general category to Cf, which could be detrimental to French.

However we need to admit that NNBSP is basically not a Latin but a
Mongolian space, despite being readily attracted into Western typography.
A similar disturbance takes place in word processors, where except in
Microsoft Word 2013, the NBSP is not justifying as intended and as it is
on the web. It’s being hacked and hijacked despite being a bad compromise,
for the purpose of French punctuation spacing. That tailoring is in turn
very detrimental to Polish users, among others, who need a justifying
no-break space for the purpose of prepending one-letter prepositions.

Fortunately a Polish user found and shared a workaround using the string
<space><ZWNBSP>, the latter being still used in lieu of WORD JOINER as
long as Word keeps unsupporting latest TUS (an issue that raised concern
at Microsoft when it was reported, and will probably be fixed or has
already been fixed meanwhile).

>
> Another spacing mess occurs with the Thai repetition mark U+0E46 THAI
> CHARACTER MAIYAMOK, which is supposed to be separated from the
> duplicated word by a space. I'm not sure whether this space should
> expand for justification any more often than inter-letter spacing. Some
> fonts have taken to including the preceding space in the character's
> glyph, which messes up interoperability. An explicit space looks ugly
> when the font includes the space in the repetition mark, and the lack of
> an explicit space looks illiterate when the font excludes the leading
> space.

It seems to me that these disturbances are a case of underspecification.
TUS treats U+0E46 thai character maiyamok [1] on a single line in the
Thai section, while other marks are given more detailed descriptions.
That wouldn’t be problematic per se as long as things are obvious.
Obviously here they are not, but no attempt is made on Unicode level to
fix them, the less as the encoding proposal, if it could be retrieved,
probably would show that it didn’t provide any more details (otherwise
Unicode would have implemented them I figure out).

I suspect that the same holds true for French: Nobody among the relevant
people at the forefront cared about making demands and specifying, so
TUS authors (who anyway were “falling like flies”) couldn’t help leaving
French alone — possibly at the discretion of a trend to lock up this key
behavior inside proprietary text rendering systems (including proprietary
OTF typefaces). That isn’t really what Unicode is about, the less as
Latin script typically has scarce OpenType support at reach. It’s just
understandable in front of disinterest and unconcernedness. At the other
end, Vietnamese typographers didn’t wait for an invitation but started
an “intense lobbying” on their own behalf to get precomposed letters
into the Unicode standard a long while before v1.0.

Marcel

[1] That’s what a copy-pasted snippet from TUS ends up as, despite my
     kind request about whether to set the character names in the plain
     text backbone to all-caps and to rather apply a resizing style.
Received on Thu Jan 17 2019 - 05:31:52 CST

This archive was generated by hypermail 2.2.0 : Thu Jan 17 2019 - 05:31:52 CST