From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Tue Jan 23 2007 - 03:05:31 CST
On Mon, 22 Jan 2007, Doug Ewell wrote:
> I always thought the convention of using a double hyphen to indicate
> line-splitting hyphenation at a point where lexeme-joining hyphenation would
> have occurred anyway was a simply brilliant idea, one I wish were in more
> widespread use.
It would indeed be useful to make such a distinction, at the character
level, at the glyph level, or both. In text processing, it would be
relevant to know whether a word (or other expression) actually contains a
hyphen or there's just a hyphen at the end of a line to indicate
continuation of the word on the next line.
I'm biting my tongue to avoid saying that the soft hyphen character was,
at least in some people's interpretations, meant to act as line-splitting
hyphenation character but then turned into a discretionary hyphen.
Anyway, Unicode is about characters that are used, rather than characters
that should be used. On the other hand, this is a chicken and egg problem
these days. When most texts are written using computers and appear in
digital form, thereby inevitably using encoded characters, there is little
room for introducing new characters.
If some community wants to use some new character, it has to encode it
somehow or to present it as an image. The image approach is awkward, so
they would in practice need to use Private Use codepoints. Well, in
practice a character might be introduced as a glyph variant of an existing
character even though you would _mean_ it to be separate character, or it
could even be placed in a codepoint reserved for some other character, one
that you won't need - after all, this is the bad old approach of extending
character repertoire by creating fonts like "Symbol" or "Wingdings".
So maybe it should be possible to add characters to Unicode just because
some people think they are needed. Of course, the criteria for this should
rather demanding, and of course the introduction of new characters that
way could easily be construed as opening the way for Joe Q. Public to
request his personal inventions to be encoded just in case someone else
might want to use them.
If we wanted to make the distinction, the natural approach would be to add
a character, say CONTINUATION HYPHEN, for optional use as a hyphen to
indicate that a word has been split to two lines. Perhaps the meaning of
HYPHEN (which isn't much used yet) should then be restricted to a word
hyphen. Naturally, HYPHEN-MINUS would keep its ambiguous (actually,
multiple) semantics and usage.
Whether CONTINUATION HYPHEN looks any different from HYPHEN (or from
HYPHEN-MINUS) would best be left to font designers and users. What we can
say for sure at the general level is that it should either look identical
to HYPHEN or clearly different from it (e.g., a slanted Fraktur-type
hyphen, possibly a slanted double hyphen).
-- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
This archive was generated by hypermail 2.1.5 : Tue Jan 23 2007 - 03:07:34 CST