Standaridized variation sequences for the Desert alphabet?

Michael Everson everson at
Mon Mar 27 07:59:40 CDT 2017

On 27 Mar 2017, at 08:05, Martin J. Dürst <duerst at> wrote:

>> Consider 2EBC ⺼ CJK RADICAL MEAT and 2E9D ⺝ CJK RADICAL MOON which are apparently really supposed to have identical glyphs, though we use an old-fashioned style in the charts for the former. (Yes, I am of course aware that there are other reasons for distinguishing these, but as far as glyphs go, even our standard distinguishes them artificially.)
> "apparently", maybe. Let's for a moment leave aside the radicals themselves, which are to a large extent artificial constructs.

I do stipulate not being a CJK expert. But those are indeed different due to their origins, however similar their shapes are. 

> Let's look at the actual characters with these radicals (e.g. U+6709,... for MOON and U+808A,... for MEAT), in the multi-column code charts of ISO 10646. There are some exceptions, but in most cases, the G/J/K columns show no difference (i.e. always the ⺝ shape, with two horizontal bars), whereas the H/T/V columns show the ⺼ shape (two downwards slanted bars) for the "MEAT" radical and the ⺝ shape for the moon radical. So whether these radicals have identical glyphs depends on typographic tradition/font/…

They are still always very similar, right?

> In Japan, many people may be rather unaware of the difference, whereas in Taiwan, it may be that school children get drilled on the difference.

That’s interesting. 

>> One practical consequence of changing the chart glyphs now, for instance, would be that it would invalidate every existing Deseret font. Adding new characters would not.
> Independent of whether the chart glyphs get changed, couldn't we just add a note "also # in some fonts" (where # is the other variant). 

Well, no. First, ALL fonts currently use the 1855 letterforms based on ligatures ���� and ����, so a decree that those code positions would 

Second, the letterforms resulting from the ligations are just nothing alike 

> That would make sure that nobody could claim "this font is wrong" based on the charts. (Even if a general claim that the chart glyphs aren't normative applies to all charts anyway.)

As James Kass said: "If spelling a word with an x+y string versus a z+y string represents two different spellings of the same word, then hand printing the same word with either an x/y ligature versus a z/y ligature also represents two different spellings of the same word."

>> Changing to a different font in order to change one or two glyphs is a mechanism that we have actually rejected many times in the past. We have encoded variant and alternate characters for many scripts.
> Well, yes, rejected many times in cases where that was appropriate. But also accepted many times, in cases that we may not even remember, because they may not even have been made explicitly.

Do come up with examples if you have any. 

> Because in such cases, the focus may not be on a change to one or a few letter shapes, but the focus may be on a change of the overall style, which induces a change of letter shape in some letters.

To be honest I really don’t follow this reasoning. isn’t just some “glyph variation”. They are entirely different glyphs with entirely different origins. I can think of no instance where we have "unified” such wildly different glyphs. 

> The roman/italic a/ɑ and g/ɡ distinctions (the later code points only used to show the distinction in plain text, which could as well be done descriptively),

Aa and Ɑɑ are used contrastively for different sounds in some languages and in the IPA. Ɡɡ is not, to my knowledge, used contrastively with Gg (except that ɡ can only mean /ɡ/, while orthographic g can mean /ɡ/, /dʒ/, /x/ etc. But g vs ɡ is reasonably analogous to �� and <lig>����</lig> being used for /juː/.

> as well as a large number of distinctions in Han fonts, come to my mind. I'm quite sure other scripts have similar phenomena.

Again, spelling of all kinds varies greatly in Deseret texts. I’ll try with another example using some Latin glyphs. “Poison” can be written ������������ POIZƐN in Deseret, or it can be written ���������� PƟZƐN or it can be written ��<����>������ PɄZƐN. That’s three different spellings, not two. (I used O with a bar to mimic the bar of Deseret SHORT I ��). 

>> Character identity is not defined by any single criterion. Moreover, in Deseret, it is not the case that all texts which contain the diphthong /juː/ or /ɔɪ/ write it using EW �� or OI ��. Many write them as Y + U ���� and O + I ����. So the choice is one of *spelling*, and spelling has always been a primary criterion for such decisions.
> This is interesting information. You are saying that in actual practice, there is a choice between writing ���� (two letters for a diphthong) and writing ��. In the same location, is ���� (the base for the historically later shape variant of ��; please note that this may actually be written ����;

No, that’s not correct. Poison can be written with ���� or it can be written with �� (in origin a ligature of ����) or it can be written with <lig>����</lig>. Unligated, the three spellings would be different: ������������ /poɪzǝn/ and ������������ /pɒɪzǝn/ and ������������ /pɔːɪzǝn/. Despite this, with the ligatures, the pronunciation would be /poɪzǝn/ whether ������������ or ���������� or ��<����>������. 

> there's some inconsistency in order between the above cited sentence and the text below copied from an earlier mail) also used as a spelling variant?

I don’t think so.

> Overall, we may have up to four variants,

No, we don’t. See above. And the same goes for the /juː/ ligatures. The word tube /tjuːb/ can be written TYŪB �������� or ������ or ��<����>��. But the unligated the sequences would be pronounced differently: �������� /tjuːb/ and �������� /tɪuːb/ and �������� /tɪʊb/. 

> of which three are currently explicitly supported in Unicode.

The characters <����> and <����> are not encoded. 

> Are all of these used as spelling variants?

In principle, what I have shown above is accurate. I can’t do a corpus search for actual examples. 

> Is the choice of variant up to the author (for which variants), or is it the editor or printer who makes the choice (for which variants)?

In a handwritten manuscript obviously the choice is the author’s. As to historical printing, printers may have 

> And what informs this choice? If we have any historic metal types, are there examples where a font contains both ligature variants?

Ken Beesley have samples of a metal font (the 1857 St Luois punches) which had both �� and ����; I don’t know what other sorts were in that font. 

> (Please note that because ��, ��, and �� are available as individual letters, it's very difficult to think about the two-letter sequences as anything else than spellings, but that doesn't necessarily carry over to the ligatures.)

See above. 

> And then the same questions, with parallel (or not parallel) answers, for ɒɪ/ɔɪ/��.

See above.

Michael Everson

> Regards,    Martin.
> Text copied from earlier mail by Michael:
> >>>>
> 1. The 1855 glyph for �� EW is evidently a ligature of the glyph for the diagonal stroke of the glyph for �� SHORT I [ɪ] and �� LONG OO [uː], that is, [ɪ] + [oː] = [ɪuː], that is, [ju].
> 2. The 1855 glyph for �� OI is evidently a ligature of the glyph for �� SHORT AH [ɒ] and the diagonal stroke of the glyph for �� SHORT I [ɪ], that is, [ɒ] + [ɪ] = [ɒɪ], that is, [ɔɪ].
> That’s encoded. Now evidently, the glyphs for the 1859 substitutions are as follows:
> 1. The 1859 glyph for EW is evidently a ligature of the glyph for the diagonal stroke of the glyph for �� SHORT I [ɪ] and �� SHORT OO [ʊ], that is, [ɪ] + [ʊ] = [ɪʊ], that is, [ju].
> 2. The 1859 glyph for OI is evidently a ligature of the glyph for �� LONG AH [ɔː] and the diagonal stroke of the glyph for SHORT I [ɪ], that is, [ɔː] + [ɪ] = [ɔːɪ], that is, [ɔɪ].
> >>>

More information about the Unicode mailing list