From: Edward C. D. Hopkins (chopkins@ameritech.net)
Date: Thu Apr 10 2003 - 13:15:55 EDT
----- Original Message -----
From: "Kenneth Whistler" <kenw@sybase.com>
>
> The point Joop was making is that in *modern* Greek usage a
> distinction is made, where the "lightning koppa" appears as a numeral
> for numbering of legal clauses, or the like, but where the "lollipop
> koppa" is the modern rendition of the archaic koppa (which, of course,
> also had a numeric value). It is *this* distinction which led to
> the separate encoding, not a failure to recognize that ultimately
> all the koppas were, in some sense, the "same letter".
OK, understand.
> See the table for the Unicode Collation Algorithm, UTS #10.
> It isn't an "established Unicode Greek sorting algorithm" per se,
> but it does provide a default ordering for all Greek characters,
> along with all other Unicode characters. Archaic koppa isn't
> dealt with yet in that table, since it is a recent addition
> to Unicode.
Thanks for the pointer.
> Note that the lunate sigma was encoded also for *modern* reasons.
> It is distinguished in modern typography, as Joop indicates.
> And while it can create problems for sorting and searching,
> there already is such a problem in modern Greek (or modern
> Greek representations of Classical or Ancient Greek) because
> of the sigma and final sigma -- both of which are also just
> the "same letter". This is *already* handled in the Unicode
> Collation Algorithm by giving all three flavors of sigma the
> same primary weights:
>
> 03C3 ; [.0CA6.0020.0002.03C3] # GREEK SMALL LETTER SIGMA
> 03F2 ; [.0CA6.0020.0004.03F2] # GREEK LUNATE SIGMA SYMBOL; QQK
> 03C2 ; [.0CA6.0020.0019.03C2] # GREEK SMALL LETTER FINAL SIGMA; QQK
> ^^^^ ^^^^
> primary tertiary
>
> The differences in the weights are at the tertiary level, which
> results in same sorting together with same, and these letters
> only being distinguished the way that capital versus small
> letters are distinguished.
>
> Your problem, as a Classical numismatics specialist, (and the
> same applies in general to paleographers and papyriologists)
> is that you confront many *more* glyphs for the various
> characters than just the very few distinctions that got
> encoded as distinct forms in the Unicode Standard for one
> modern reason or another. The eventual result is not going
> to be the encoding of each distinct glyph of each distinct
> letter as another character in Unicode. Instead, as you
> are doing, you need to lay out a subsidiary variant space
> which you can map to specialized fonts, and associate all
> of those variants with the Greek alphabet (in your case) or
> whatever base set of characters might apply in other
> paleographic traditions.
Many thanks for the explanation. So by mapping the archaic forms of a Greek
letter to the basic Unicode Greek codepoint, the sorting will be correct. I
think I see how this works for Variation Selectors (which must be officially
approved), but I don't see how it could work if using things I control, say,
OpenType unmapped glyphs with <aalt> or <salt> entries. Can these be sorted?
Chris Hopkins
This archive was generated by hypermail 2.1.5 : Thu Apr 10 2003 - 14:14:42 EDT