L2/02-357
From: Eric Muller
Date: 2002-10-25 18:10:13 -0700
Subject: Re: Dashes

I made the original proposal for new dashes under the assumption that the various dashes are defined in Unicode by their width; that's what the first part of the proposal explains. Under that assumption, I was really after the 3/4 em dash, and included the other widths for completeness.

The discussion during the UTC meeting revealed that my assumption was wrong. I was given the action item 92-A11: "Get contrasting examples to show em dashes, 3/4 em dashes and 1/3 em dashes." As some of you predicted, I did not find any convincing example (other than the obvious "here are the various widths of dashes that are in use:..."). Lisa and Cathy, you can take this a resolution of the AI: "done; they aren't any".

Given that, my current take on 2em and 3em dashes, and on the public issue in particular is:

In the end, I don't feel a strong the need to express 2em and 3em as characters, either on their own or as compositions of existing dashes;  and I don't think we can achieve a reliable rendering by composition.

This discussion also touched on the use of dashes for quotations, e.g. in French. Two comments:

Now, I am still left with my original problem. My new assumption is the following: the dashes are defined by their use. This actually solves rather nicely the problem of the various styles in use, which the "defined by width" approach does not handle well. Let me give a little bit of context: the kind of workflow I am interested in is when you have documents with characters and markup but no style on the one hand, and style sheets on the other; to be concrete, Docbook and XSLT stylesheets is a good archetype.What's interesting about this model is that it accounts well for "network publishing", i.e. the same material (Docbook document) presented in multiple ways (change the stylesheet and their target) and it can also be used to explain the more traditional wysiwig approach as well, by saying that the user manipulates simultaneously the document and the stylesheet.

In that world, defining the dash characters by their width is problematic. The decision to set off a phrase using non-spaced em dashes or to set off a phrase using space en-dashes really belongs to the stylesheet, and it is desirable to have the same content in the document, regardless of the style(s) by which this content is going to be rendered. This is much easier to achieve if we declare that U+2014 EM DASH is the character used to set off a phrase, and that it can be rendered by a spaced en-dash. The only alternative I see is to carry the "set off a phrase" bit by markup instead, but that seems a bit heavy handed.

In the end, my new quest is to get U+2014 EM DASH and U+2013 EN DASH understood literally as they are described in section 6.1, by their use, and to essentially ignore the EM and EN in their names (much like we all know to replace LEFT by OPENING in U+0028 LEFT PARENTHESIS). Together with U+2015 HORIZONTAL BAR (understood as a quotation dash, as used in French) and U+2012 FIGURE DASH, I believe we have covered all the important functions. It may be worth crafting additional words for 6.1, to say that those characters can be rendered by glyphs that are not 1em or 1en wide, with more or less space. I'll be happy to propose some words to that affect if we like this approach.

Thanks to Ken for opposing my new dash proposal (at least for 3/4em and 1/3em); first because he is right, second because this is not what I need.

Eric.