Re: Four characters from Greek Extended block missing?

From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Feb 16 2001 - 14:00:11 EST


Otto Stolz asked:

> in the Greek Extended block, five of the lower-case characters
> do not have upper-case equivalents, viz.
> U+1FE4 GREEK SMALL LETTER RHO WITH PSILI
> U+1F50 GREEK SMALL LETTER UPSILON WITH PSILI
> U+1F52 GREEK SMALL LETTER UPSILON WITH PSILI AND VARIA
> U+1F54 GREEK SMALL LETTER UPSILON WITH PSILI AND OXIA
> U+1F56 GREEK SMALL LETTER UPSILON WITH PSILI AND PERISPOMENI
>

> However, the missing upsilon variants escape my understanding:
> - word-initial upsilon (both lower-case and upper-case) must take
> a breathing mark,
> - medial and final upsilons do not take breathing marks.
> So, you will either need both sorts of marks on both cases,
> or you will need only dasia on both cases (I do not remember any
> word starting with psili-upsilon, but my Greek is rather rusty).
>
> So the questions are:
> - are the above-mentioned lower-case upsilon composites useless,
> and entered Unicode only by an oversight, or

No. Initial upsilon with PSILI (smooth breathing) is exceedingly
rare in classical Greek, but it does occur. I find exactly two
instances in my copy of the intermediate Greek-English Lexicon
(Liddell and Scott):

One entry showing 1F54 ~ 1F56 meaning "sound to imitate a person
snuffing a feast" [sic].

And one head entry in caps showing <U+1FCE, U+03A5, U+03A1, U+03A7,
U+0391> ,'YRXA meaning "a jar, for
pickles".

Clearly these are both "funny" words. The first is onomatopoetic,
and the second is probably a borrowing of some sort from a non-Greek
language. The vast preponderance of upsilon-initial words in classical
Greek have rough breathings.

No doubt someone with access to more extensive classical and
Byzantine Greek lexica might turn up a few other instances,
including, I am guessing, instances of 1F50 and 1F52.

> - are their upper-case equivalents missing by an oversight, or

I don't think so.

> - is there indeed a rationale for this anomaly?

The entire 1FXX set was provided by ELOT,
the Greek national body, and they had prescriptive, as well
as descriptive intent in choosing the set that they did. I suspect
that they thought that uppercase initial upsilon with a smooth
breathing would not fit their orthographic rules for polytonic
Greek (although there are instances of it in print, as in the
uppercase head entry in Liddell and Scott for "pickle jar").

And in any case, by use of the spacing breathing/accent
combinations U+1FCE, etc., plus regular uppercase upsilon,
you can represent any of the missing letters, anyway. (As I
have done above for the all caps pickle jar entry.)

> Note that the code-points where you would expect these upper-case
> upsilon compositions, viz. U+1F58 U+1F5A U+1F5C U+1F5E, are left
> unassigned (reserved).
>
> Can anybody shade some light on this anomaly: either explain the
> underlying rationale, or acknowledge the oversight?

The Unicode take on this is that the entire block U+1F00..U+1FFE
of precomposed polytonic Greek is unnecessary, since it is
all decomposable into the regular Greek alphabet and a small
number of accents.

There clearly would be no benefit at this point in adding in
the 4 (or 5) "missing" polytonic Greek characters, since in *all*
Unicode normalization forms they would end up being decomposed into
the already existing combining character sequences that can be
used to represent them now without any character additions.

--Ken



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:18 EDT