1. | The problem |
2. | Proposed characters |
From a typographic point of view, hyphens, minuses and dashes are distinct classes of glyphs, with fairly specific uses. Hyphens are used for compound words and to mark hyphenation. Minuses are used for mathematical notation. Dashes are horizontal rules of varying widths used to separate words, sentences and so on. In this paper, we are interested in dashes only.
The standard describes the use of various dashes in section 6.1, page 150:
U+2012 FIGURE DASH is present for compatibility with existing standards; it has the same (ambiguous) semantic as the U+002D HYPHEN-MINUS, but has the same width as digits (if they are monospaced). U+2013 EN DASH is used to indicate a range of values, such as 1973–1984. It should be distinguished from the U+2122 MINUS, which is an arithmetic operator; however, typographers have typically used U+2013 EN DASH in typesetting to represent the minus sign. [...]
U+2014 EM DASH is used to make a break—like this—in the flow of a sentence. It is commonly represented with a typewriter as a double-hyphen. In older mathematical typography, U+2014 EM DASH is also used to indicate a binary minus sign. U+2015 HORIZONTAL BAR is used to introduce quoted text in some typographic styles.
The most likely interpretation of the standard is that the various dashes are defined by their width rather than by their use. Furthermore, defining dashes by their use would be problematic since the choice of a width is an aesthetic choice. For example, Bringhurst prefers to use spaced en dashes to set off phrases: “The em dash is the nineteenth-century standard, still prescribed in many editorial style books, but the em dash is too long for use with the best text faces. Like the oversized space between sentences, it belongs to the padded and corseted aesthetic of Victorian typography.”
The problem is that other widths are commonly used, and we therefore propose to extend the repertoire of dashes in Unicode.
This list is based on two sources: The Elements of Typographic Style by Robert Bringhurst, 2nd edition, section 5.2; The Chicago Manual of Style, 13th edition, sections 5.95, 5.96 and 15.94.
The following widths are commonly encountered:
We propose the names THREE-QUARTER EM DASH, THREE-TO-EM DASH, TWO EM DASH and THREE EM DASH for those characters, with the same properties as the EN DASH and EM DASH:
For East-Asian width, we propose to treat all these characters like the EN DASH and EM DASH, ie. ambiguous (A).
For line breaking, we propose to treat the THREE-QUARTER EM DASH like the EM DASH (B2), the THREE-TO-EM DASH as the EN DASH (BA). The uses of the TWO EM DASH and THREE EM DASH as replacement for missing letters or words suggests that they be treated like latin letters (AL).
Author: Eric Muller
Revision | Date | Comments |
July 30, 2002 | First version |