(was: Re: Greek/Etruscan/Gothic Unification Proposal)
In message <9711181922.AA16328@unicode.org> John Cowan, via
unicode@unicode.org, writes:
> The following pre-proposal suggests a unification of the
> archaic Etruscan and Gothic scripts with Unicode Greek...
By pure coincidence, the following was also circulated to the
tc46sc2@elot.gr email discussion list on transliteration recently,
and also to the ISO/IEC JTC1/SC22/WG20 list. Like John Cowan's email,
this suggested similar links between different scripts related to
Greek, in particular between Greek, Cyrillic and Georgian scripts.
The point of the email via tc46sc2@elot.gr was NOT to suggest
unification in ISO/IEC 10646, but to aim at rationalising the
repertoire of ISO/IEC 10646 and ISO 9, and the character order in
ISO/IEC 14651 and ISO 9. This would mean rationalising standards in
ISO/IEC JTC1/SC2, ISO/IEC JTC1/SC22/WG20 and in ISO/TC46/SC2.
It is quite a complex document, and will require some study to see
the conventions used (a key is given near the end of the email) so
please ignore/delete this if you are not interested in Greek,
Cyrillic and Georgian repertoires and alphabetic orders.
* * * * * * * *
RATIONALE FOR MULTILINGUAL SORTING OF CYRILLIC CHARACTERS
John Clews
This updates my earlier suggestions to ISO/IEC JTC1/SC22/WG20 on
Cyrillic sorting, with more solid evidence. I would guess that a
consensus would soon emerge on this issue.
The basis for sorting conventions should be user expectations.
Well-documented alphabetic orders exist for Church Slavic, which is
broadly in the same order as Russian. This is also used in national
transliteration standards: BS 2729: 1958 provides much fuller
information than does ISO 9 in this respect. BS 2729 also matches the
sorting order in various well-established reference sources (e.g.
DE BRAY, R.G.A. Guide to the Slavonic Languages. London: Dent, n.d).
Conventions for using letters for numbering, e.g. in dates and in
item lists, also exist across several European languages, and are
almost identical across Greek, Cyrillic and Georgian. This also helps
to establish user expectations across a wide range of European
languages and scripts.
As it is possible to produce a chart showing the Church Slavonic
filing order for all characters in terms of BS 2729, and also the
numerical conventions, this should be the basis for the order of a
pan-Cyrillic ordering.
Within this chart, it should be possible to interpolate that
additional (mainly non-Slavonic) letters later, usually following
their cognate characters (e.g. variants of KA after KA).
A further chart providing these interpolations will be provided in
due course.
Here is the basic chart showing the Slavonic alphabet, as listed in
BS 2729: 1958 (still current) and relationships between Cyrillic and
two other European scripts.
BS2979 SLAVONIC GREEK GEORGIAN
BS -Num ID Name -Num ID Name -Num ID Name
1. -1 0430 Cy_a : -1 03B1 Gr_ALPHA : -1 10D0 Ge_AN
2. 0431 Cy_be
3. -2 0432 Cy_ve : -2 03B2 Gr_BETA : -2 10D1 Ge_BAN
4. -3 0433 Cy_ghe : -3 03B3 Gr_GAMMA : -3 10D2 Ge_GAN
5. -4 0434 Cy_de : -4 03B4 Gr_DELTA : -4 10D3 Ge_DON
6. -5 0435 Cy_ie : -5 03B5 Gr_EPSILON : -5 10D4 Ge_EN
7. 0436 Cy_zhe .
* 8. -6 0455 Cy_dze : -6 03DA Gr_STIGMA : -6 10D5 Ge_VIN
9. -7 0437 Cy_ze : -7 03B6 Gr_ZETA : -7 10D6 Ge_ZEN
10. -8 0438 Cy_i : -8 03B7 Gr_ETA : -8 10F1 Ge_HE
[For -9 see BS 42.] : -9 03B8 Gr_THETA : -9 10D7 Ge_TAN
11. -10 0456 Cy_be-uk_i : -10 03B9 Gr_IOTA : -10 10D8 Ge_IN
12. -20 043A Cy_ka : -20 03BA Gr_KAPPA : -20 10D9 Ge_KAN
13. -30 043B Cy_el : -30 03BB Gr_LAMDA : -30 10DA Ge_LAS
14. -40 043C Cy_em : -40 03BC Gr_MU : -40 10DB Ge_MAN
15. -50 043D Cy_en : -50 03BD Gr_NU : -50 10DC Ge_NAR
[For -60 see BS 40.] : -60 03BE Gr_XI : -60 10F2 Ge_HIE
16. -70 043E Cy_o : -70 03BF Gr_OMICRON : -70 10DD Ge_ON
17. -80 043F Cy_pe : -80 03C0 Gr_PI : -80 10DE Ge_PAR
*17a. 0481 Cy_koppa : -90 03DE Gr_KOPPA : -90 10DF Ge_ZHAR
[For -90 see BS 27.]
18.-100 0440 Cy_er :-100 03C1 Gr_RHO :-100 10E0 Ge_RAE
19.-200 0441 Cy_es :-200 03C3 Gr_SIGMA :-200 10E1 Ge_SAN
20.-300 0442 Cy_te :-300 03C4 Gr_TAU :-300 10E2 Ge_TAR
21.-400 0443 Cy_u :-400 03C5 Gr_UPSILON :-400 10E3 Ge_UN
*21a. 0479 Cy_UK (oy) : 10F3 Ge_WE
22.-500 0444 Cy_ef :-500 03C6 Gr_PHI :-500 10E4 Ge_PHAR
23.-600 0445 Cy_ha :-600 03C7 Gr_CHI :-600 10E5 Ge_KHAR
[For -700 see BS 41]. :-700 03C8 Gr_PSI :-700 10E6 Ge_GHAN **
24.-800 0461 Cy_omega :-800 03C9 Gr_OMEGA :-800 10E7 Ge_QAR **
24a. 047D Cy_omega_titlo
24b. 047B Cy_round_omega
24c. 047F Cy_ot
25. xxxx[Cy_shte - variant of 28a. below, in different order]
26.-900 0446 Cy_tse :-900 03E0 Gr_SAMPI :-900 10E8 Ge_SHIN
27. -90 0447 Cy_che
28. 0448 Cy_sha
28a. 0449 Cy_shcha
29. 044A Cy_hard_sign
30. 044B Cy_yeru
31. 044C Cy_soft_sign
32. 044D Cy_e
32a. 0463 Cy_yat
33. 044E Cy_yu
34. 044F Cy_ya
35. 0465 Cy_iotified_e
36. 0467 Cy_little_yus
*37. 046B Cy_big_yus
*38. 0469 Cy_iotified_little_yus
39. 046D Cy_iotified_big_yus
40. -60 046F Cy_ksi
41.-700 0471 Cy_psi
42. -9 0473 Cy_fita
43. 0475 Cy_izhitsa
43a. 0477 Cy_izhitsa_double_grave_accent
Key:
BS: reference number in BS 2729, table D: Transliteration of
Church Slavonic Cyrillic (shown as nn.)
1a. 1c. Numbers like this indicate that character is not in BS 2729,
but position is likely because of neighbouring characters or
otehr evidence.
-Num: numeric value of letter in that particular script (shown as -nn)
ID: UCS ID from ISO/IEC 10646
Name: UCS character name (abbreviated systematically, Using
Cy_ as a script code, and be and uk as language codes).
Only small letters are shown in this table)
* Changes required from list in CD 14651 (not complete)
** No close equivalent phonetically, despite common numeric value.
* * * * * * * *
Annex:
Other Georgian letters, in standard Georgain sort order:
1000 10E9 Ge_CHIN .
2000 10EA Ge_CAN .
3000 10EB Ge_JIL .
4000 10EC Ge_CIL .
5000 10ED Ge_CHAR .
6000 10EE Ge_XAN .
7000 10F4 Ge_HAR .
8000 10EF Ge_JHAN .
9000 10F0 Ge_HAE .
10000 10F5 Ge_HOE .
10F6 Ge_FI .
Additional Greek and Coptic letters: sort order not known to me:
03DC Gr_DIGAMMA
03F3 Gr_YOT
03E3 Co_SHEI
03E5 Co_FEI
03E7 Co_KHEI
03E9 Co_HORI
03EB Co_GANGIA
03ED Co_SHIMA
03EF Co_DEI
Sources for common sorting values in standards douments:
[1] BS 2979: 1958 Transliteration of Cyrillic and Greek (still current)
[2] ISO/TC46/SC2 N 223 Rev: Annex: Regeln fur die Alphabetische
Katalogierung [RAK] Anlage 5: Transliteration der armenischen und
georgischen Schrift. The RAK are widely used in at least Germany
and Austria.
Between them, these provide numerical values, and traditional sort
orders, for Cyrillic, Greek, Georgian and Armenian.
John Clews
15 November 1997
-- Chair of ISO/TC46/SC2: Conversion of Written Languages; Member of CEN/TC304: Character Set Technology; Member of ISO/IEC/JTC1/SC2: Character Sets.SESAME Computer Projects, 8 Avenue Road, Harrogate, HG2 7PG, England Email: Converse@sesame.demon.co.uk; tel: +44 (0) 1423 888 432
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:38 EDT