Re: NNBSP from Asmus Freytag via Unicode on 2019-01-17 (Unicode Mail List Archive)

From: Asmus Freytag via Unicode <unicode_at_unicode.org>
Date: Thu, 17 Jan 2019 11:06:48 -0800

On 1/17/2019 9:35 AM, Marcel Schneider via Unicode wrote:

[quoted mail]

But the French "espace fine insécable" was requested long long before Mongolian was discussed for encodinc in the UCS. The problem is that the initial rush for French was made in a period where Unicode and ISO were competing and not in sync, so no agreement could be found, until there was a decision to merge the efforts. Tge early rush was in ISO still not using any character model but a glyph model, with little desire to support multiple whitespaces; on the Unicode side, there was initially no desire to encode all the languages and scripts, focusing initially only on trying to unify the existing vendor character sets which were already implemented by a limited set of proprietary vendor implementations (notably IBM, Microsoft, HP, Digital) plus a few of the registered chrsets in IANA including the existing ISO 8859-*, GBK, and some national standard or de facto standards (Russia, Thailand, Japan, Korea).

This early rush did not involve typographers (well there was Adobe at this time but still using another unrelated technology). Font standards were still not existing and were competing in incompatible ways, all was a mess at that time, so publishers were still required to use proprietary software solutions, with very low interoperability (at that time the only "standard" was PostScript, not needing any character encoding at all, but only encoding glyphs!)

Thank you for this insight. It is a still untold part of the history of Unicode.

This historical summary does not square in key points with my own recollection (I was there). I would therefore not rely on it as if gospel truth.

In particular, one of the key technologies that brought industry partners to cooperate around Unicode was font technology, in particular the development of the TrueType Standard. I find it not credible that no typographers were part of that project :).

Covering existing character sets (National, International and Industry) was an (not "the") important goal at the time: such coverage was understood as a necessary (although not sufficient) condition that would enable data migration to Unicode as well as enable Unicode-based systems to process and display non-Unicode data (by conversion).

The statement: "there was initially no desire to encode all the languages and scripts" is categorically false.

(Incidentally, Unicode does not "encode languages" - no character encoding does).

What has some resemblance of truth is that the understanding of how best to encode whitespace evolved over time. For a long time, there was a confusion whether spaces of different width were simply digital representations of various metal blanks used in hot metal typography to lay out text. As the placement of these was largely handled by the typesetter, not the author, it was felt that they would be better modeled by variable spacing applied mechanically during layout, such as applying indents or justification.

Gradually it became better understood that there was a second use for these: there are situations where some elements of running text have a gap of a specific width between them, such as a figure space, which is better treated like a character under authors or numeric formatting control than something that gets automatically inserted during layout and rendering.

Other spaces were found best modeled with a minimal width, subject to expansion during layout if needed.

There is a wide range of typographical quality in printed publication. The late '70s and '80s saw many books published by direct photomechanical reproduction of typescripts. These represent perhaps the bottom end of the quality scale: they did not implement many fine typographical details and their prevalence among technical literature may have impeded the understanding of what character encoding support would be needed for true fine typography. At the same time, Donald Knuth was refining TeX to restore high quality digital typography, initially for mathematics.

However, TeX did not have an underlying character encoding; it was using a completely different model mediating between source data and final output. (And it did not know anything about typography for other writing systems).

Therefore, it is not surprising that it took a while and a few false starts to get the encoding model correct for space characters.

Hopefully, well complete our understanding and resolve the remaining issues.

A./

Received on Thu Jan 17 2019 - 13:06:57 CST

This archive was generated by hypermail 2.2.0 : Thu Jan 17 2019 - 13:06:57 CST