Research notes toward a Unicode encoding
by Richard Cook
This document examines questions relating to the encoding of elements of Fraser’s (unicameral) Lisu writing system, weighing the benefits of two possible encoding methods, one involving unification with (bicameral) Latin Script, and the other involving disunification. After examining the repertory, and considering it in relation to encoded (Unicode 5.0) characters and case mappings, unification is proposed as the preferred encoding method.
Fraser letters derive from LATIN CAPITAL letters [A..P,R..Z], but Fraser Lisu writing is unicameral: there is no case folding in Fraser orthography, and lowercase Latin letters do not occur (except when Fraser is mixed with other orthographies).
In addition to the 25 LATIN CAPITAL letters which occur in their normal form, 15 LATIN CAPITAL letters have forms that are TURNED (i.e. ‘rotated 180 degrees’); 10 other Latin letters which do not contrast (with themselves or with another letter-form in the repertory) when TURNED are used in Fraser writing, but have no TURNED counterpart; only “Q” is completely unused in the writing system. The complete set of 40 Fraser letters is therefore as follows:
non-TURNED: A B C D E F G J K L P R T U V (15) non-TURNED: H I M N O S W X Y Z (10) TURNED: A B C D E F G J K L P R T U V (15)
A table of the phonological values of these letters when used to write Southern Lisu is given in Bradley’s Southern Lisu Dictionary (2005:xxvii); this includes also a list of six tone marks:
TONE: 1 2 3 4 5 6 MARK: . , .. ., : ;
In addition, the following four signs, also re-purposing ASCII characters, are also reportedly used:
nasal: ' glide: _ comma: - period: =
Treating two of the tone marks as composite, there are a total of (40+4+4=) 48 signs in the Fraser Lisu repertory; if they are not composite, there are (40+6+4=) 50 signs total.
The 25 “non-TURNED” letter forms used in Fraser are derivative of and clearly identical in appearance to the 25 LATIN CAPITAL letters encoded in the Basic Latin block (U+0041..U+0050,U+0052..U+005A).
Of the 15 TURNED letter forms used in Fraser, the following 4 may be seen as having encoded equivalents:
0186 LATIN CAPITAL LETTER OPEN O (“LATIN CAPITAL LETTER TURNED C”) 018E LATIN CAPITAL LETTER REVERSED E (“LATIN CAPITAL LETTER TURNED E”) 0245 LATIN CAPITAL LETTER TURNED V (“LATIN CAPITAL LETTER TURNED V”) 2132 TURNED CAPITAL F (“LATIN CAPITAL LETTER TURNED F”)
The 10 Fraser tone and other marks (simple and composite) are also clearly derivative of and identical in appearance to encoded characters:
0027 APOSTROPHE 002C COMMA 002D HYPHEN-MINUS 002E FULL STOP 003A COLON 003B SEMICOLON 003D EQUALS SIGN
The following table summarizes the encoding status of the 48 Fraser Lisu orthographic elements:
Encoded TURNED: C E F V (04) LATIN: A B C D E F G J K L P R T U V (15) LATIN: H I M N O S W X Y Z (10) OTHER: ; : , . ' _ - = (08) Not Encoded TURNED: A B D G J K L P R T U (11)
Since Latin Script is bicameral by definition, unifying (nominally or derivationally upper-case) Fraser letters with encoded elements of Latin Script requires that we consider casing relations, even though Fraser itself is unicameral.
Not Encoded TURNED: A B D G J K L P R T U (11) CASING: + - - + ? + - - + + -
The above 11 unencoded TURNED Fraser letters are divided into two groups, according to whether the corresponding lowercase form is distinctive or non-distinctive in appearance when TURNED. The bottom row in the table above shows that if the corresponding lowercase form is distinctive (marked “+” above) it is already encoded, in every case but one (marked “?” above); in all other cases it is non-distinctive and unencoded (“-”). The lowercase counterpart of a TURNED J could be represented as the sequence <U+027E LATIN SMALL LETTER R WITH FISHHOOK, U+0323 COMBINING DOT BELOW>, and if case mappings may involve sequences, encoding of a “LATIN SMALL LETTER TURNED J” is not required. The case mappings are:
0250 LATIN SMALL LETTER TURNED A ↔ “XXXX LATIN CAPITAL LETTER TURNED A” 0279 LATIN SMALL LETTER TURNED R ↔ “XXXX LATIN CAPITAL LETTER TURNED R” 0287 LATIN SMALL LETTER TURNED T ↔ “XXXX LATIN CAPITAL LETTER TURNED T” 029E LATIN SMALL LETTER TURNED K ↔ “XXXX LATIN CAPITAL LETTER TURNED K” 1D77 LATIN SMALL LETTER TURNED G ↔ “XXXX LATIN CAPITAL LETTER TURNED G” ????“LATIN SMALL LETTER TURNED J” ↔ “XXXX LATIN CAPITAL LETTER TURNED J” 0250;LATIN SMALL LETTER TURNED A;Ll;0;L;;;;;N;;;;XXXX; 0279;LATIN SMALL LETTER TURNED R;Ll;0;L;;;;;N;;;;XXXX; 0287;LATIN SMALL LETTER TURNED T;Ll;0;L;;;;;N;;;;XXXX; 029E;LATIN SMALL LETTER TURNED K;Ll;0;L;;;;;N;;;;XXXX; 1D77;LATIN SMALL LETTER TURNED G;Ll;0;L;;;;;N;;;;XXXX; ????;LATIN SMALL LETTER TURNED J;Ll;0;L;;;;;N;;;;XXXX;
It is therefore proposed that only the following 11 Fraser letters be proposed for future encoding, named as follows:
LATIN CAPITAL LETTER TURNED A LATIN CAPITAL LETTER TURNED B LATIN CAPITAL LETTER TURNED D LATIN CAPITAL LETTER TURNED G LATIN CAPITAL LETTER TURNED J LATIN CAPITAL LETTER TURNED K LATIN CAPITAL LETTER TURNED L LATIN CAPITAL LETTER TURNED P LATIN CAPITAL LETTER TURNED R LATIN CAPITAL LETTER TURNED T LATIN CAPITAL LETTER TURNED U
Last updated: 20070621:10:18:18
Prior Notes: 0.