Re: encoding polytonic Greek

From: Mark Davis (mark@macchiato.com)
Date: Sun Aug 29 1999 - 15:36:54 EDT


As far as the first of the issues, if you look at the Unicode Character Database, 3.0 beta (ftp://ftp.unicode.org/Public/3.0-Update/) you will find that the Unicode Standard Version 3.0 now treats U+1F71 as a canonical equivalent to U+03AC. Here are the two relevant lines from ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.beta.txt

03AC;GREEK SMALL LETTER ALPHA WITH TONOS;Ll;0;L;03B1 0301;;;;N;GREEK SMALL LETTER ALPHA TONOS;;0386;;0386

1F71;GREEK SMALL LETTER ALPHA WITH OXIA;Ll;0;L;03AC;;;;N;;;1FBB;;1FBB

This means, for example, that if the text is normalized (forms C or KC) you will only see 03AC, not 1F71. (cf. http://www.unicode.org/unicode/reports/tr15/.)

As to the numeral sign, that was added to Unicode/10646 at the insistence of the Greek national body.

Mark

Constantine Stathopoulos wrote:

> On 29/8/1999, at 2:31 ðì, peter_constable@sil.org wrote:
>
> > OK, let me pose a slightly different question: Jeroen has
> > suggested that the accent/breathing characters in the extended
> > block such as U+1FBD *never* be used in forming words within
> > actual text (I've document in an earlier message, though, that
> > this is being done.), but only for special purposes such as
> > meta-text description.
>
> Jeroen is correct that the spacing accents are there *primarily* for special purposes, but as I wrote earlier they may also be used (misused?) in normal text. This, however, should normally be a decision made by the user, not the software. Option A is the way to go with precomposed Greek characters.
>
> Anyway, a more serious issue is encoding of OXIA and combinations. I know that there are two series of precomposed characters with OXIA, one in the Greek Extended block and another in the Greek block (described as TONOS, since in the monotonic/simplified spelling system OXIA remains the only "tonos"(=accent) and the rest are abolished along with the breathings). This may be a crystal clear solution for non-Greek scholars who are interested in polytonic only, but it is not so for a native Greek user who may use both monotonic and polytonic. A simple input implementation would only require a slight modification (according to old polytonic typewriters) of the current Greek (monotonic) keyboard layout, say, of Windows NT in order to write texts in both spelling systems with the same keyboard layout. In this case, the sensible thing to do is map OXIA and all of its combinations in Greek Extended to TONOS and combinations in the Greek block; thus, texts written in monotonic will also be
> legible with fonts that only contain the Greek block and not the Greek Extended one.
>
> Another issue is that in 8859-7 implementations spacing TONOS (U+0384) is *heavily* used as the right "keraia" (U+0374 - Greek numeral sign), being a perfect homoglyph. It is not unreasonable to expect that this practice of Greek users will continue in Unicode implementations. A similar problem might exist with PSILI (U+1FBF) and KORONIS (U+1FBD).
>
> Constantine Stathopoulos
> Iris Media Internet Solutions
>
> ---
> P.S. To avoid an undeserved lesson in Unicode terminology: I am aware of the difference between *character* and *glyph*. It is the difference between *theory* and *practice* that concerns me... :-)



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT