From: Ernest Cline (ernestcline@mindspring.com)
Date: Fri Mar 05 2004 - 21:37:39 EST
> [Original Message]
> From: Kenneth Whistler <kenw@sybase.com>
>
> > but that does not affect the point that I made that there are at least
> > three different variants that have been used in American English
> > dictionaries of the same basic concept, representing a voiced
> > th in a manner that is readable as a plain text th if one ignores
> > the typographic cruft.
> >
> > TH WITH STRIKETHROUGH
> > ITALIC TH LIGATED BY HOOK
> > PLAIN TH LIGATED BY CROSSBAR
>
> Yes, we know that.
>
> > three separate glyphic representations of the same character
> >
> > LEXICOGRAPHIC VOICED TH
>
> Here is the problem. This is not a *character*. It is a phoneme
> of English, relevant to the lexicography of English.
But the representation of lexicography has no standard outside
of the standard that a particular publisher chooses to use for its
own dictionaries. Why then should TH WITH STRIKETHROUGH
be anything more than a Private Use character?
> > found in printed American English dictionaries that are designed
> > to be readable as th, but typographically set apart to indicate
> > how the word is to be pronounced
>
> > Now if one were to take a dictionary that used any one of those
> > three forms mentioned above and consistently replaced one
> > glyph with another throughout the dictionary, would there truly
> > be any difference in the text?
>
> Yes, there would be. Because the text would not be presented
> correctly for that dictionary's conventions. It is a presentational
> distinction, admittedly, rather than a "semantic" distinction
> in the intended pronunciation represented, but it would,
> nonetheless, be considered an error in textual representation.
I repeat, if this distinction based on the conventions of a particular
publisher and which have no particular currency outside of that
publisher is important at all, what justifies this as being a character
for general text and not for private use?
> The only real question here is whether:
>
> TH WITH STRIKETHROUGH
> PLAIN TH LIGATED BY CROSSBAR
>
> as you designated them, are two different glyphs of one
> encoded character:
>
> 1D7A LATIN SMALL LETTER TH WITH STRIKETHROUGH
>
> (in which case you'd need style and/or font distinctions to
> get the correct glyph used for particular dictionaries)
>
> or whether they are two different glyphs, one each for two
> encoded characters:
>
> 1D7A LATIN SMALL LETTER TH WITH STRIKETHROUGH
> 1DXX LATIN SMALL LETTER TH WITH STROKE
>
> (in which case you don't need font distinctions to get the
> correct glyph used for particular dictionaries).
The ligation in the dictionary I referred to has the extended
crossbar of the t only reaching but not continuing past the
vertical stroke of the h. In all other respects such as kerning
and shape It is the same as a normal t followed by a normal h.
It clearly is not a glyph that could in any way be considered
the same as LATIN SMALL LETTER TH WITH STRIKETHROUGH.
> The italic form can be dealt with by ligation control on
> italic styled text using simply a <t, h> sequence.
That assumes that the manner of the ligation is unimportant
to the user. Of course, if it is then the user would have to
specify a specific font or use a private use character.
However, if this is sufficient for that use, why is it insufficient
for TH WITH STRIKETHROUGH to instead use markup
control providing the strikethrough for a ligated <t,h> sequence?
> The UTC chose the first solution, at least for the moment.
>
> > If you can point out how that
> > would be the case, I'll gladly concede the point.
> >
> > As for the name used for this character, I don't really have
> > a preference as long as it does not lock in one specific
> > glyph.
>
> With very few exceptions (the Zapf dingbats), *no* Unicode
> character ever "lock[s] in one specific glyph". Unicode is
> a character encoding standard -- the particular presentation
> forms that the glyphs take is a matter of font design and
> other factors.
Oh? What about U+0643 Kaf and U+06A9 Keheh? That case
has an interesting parallel to this one. Keheh is specific form
of Kaf, much as TH WITH STRIKETHROUGH can be considered
a specific form of LEXICOGRAPHIC VOICED TH. The difference
is that the distinction embodied by Keheh is generally recognized
by a group of common users, while the distinction embodied
by TH WITH STRIKETHROUGH is significant only at the level
of the choice that an individual publisher chooses to make.
As described, TH WITH STRIKETHROUGH would make more
sense as a Private Use character than as a BMP character.
Possibly as a SMP character if one was to start a
Lexicographic block for symbols used only in dictionaries.
This archive was generated by hypermail 2.1.5 : Fri Mar 05 2004 - 22:06:25 EST