From: Benjamin M Scarborough (benjamin.scarborough@student.utdallas.edu)
Date: Mon Dec 03 2007 - 13:35:48 CST
First of all, I would like to say that I just realized that there was a 
second volume of the dictionary, available at 
http://jdlib.ntl.gov.tw/cgi-bin/browse.cgi?bookid=bjn00172v02
Andrew West wrote:
>[...]
>It may be that some of the tone marks can use existing combining
>characters, but it looks like at least some of them need encoding.
I have tried to find suitable combining marks already in Unicode, but 
nothing seems to represent these tone marks appropriately.
>[...]
>> Lastly, there are two combining marks visible on individual characters:
>> a combining line above and a combining dot below. These could be
>> unified with U+0305 COMBINING OVERLINE and U+0323 COMBINING DOT BELOW
>> respectively.
>
>We need to be sure of exactly which letters these marks modify. If
>they only modify certain letters it may be simpler to encode the
>modified letter as a single non-decomposable character (letters with
>diacritic marks do not always need to be decomposable -- cf. Yi and
>Manchu).
The overline mark appears only on U, SMALL U, O, SMALL O, SA, SE, SO, 
TI, TU. Instances of all of these can be found in the indices. In many 
cases the bar connects with parts of the base character. They are 
collated separately from their unmarked counterparts; however, this 
dictionary also separates characters with (han)dakuten from their 
unmarked counterparts, thus SE < ZE < ZE WITH TOPBAR < SO < ZO < SO 
WITH TOPBAR < TA. It is unknown, then, whether this bar is intended to 
be an integral part of the character or a combining mark similar to the 
dakuten and handakuten. It would be useful to know the meaning of the 
mark. It is worth noting, however, that O < O WITH TOPBAR < U < U WITH 
TOPBAR < WO < KA at the beginning of a syllable, but N < WO < O WITH 
TOPBAR < U WITH TOPBAR at the middle/end.
The dot below mark appears on KA, KI, KU, KE, KO, SA WITH TOPBAR, SE 
WITH TOPBAR, SO WITH TOPBAR, TA, TI, TI WITH TOPBAR, TU, TU WITH 
TOPBAR, TE, TO, PA, PI, PU, PE, PO. Unlike the dakuten, handakuten, and 
topbar, the dot below is ignored for collation; characters with dot 
below are freely intermixed with unmarked characters and the 
dictionary's headers show both varieties. Furthermore, the dot below 
appears to be used with entire columns of consonants (Kx, xx WITH 
TOPBAR, Tx, Px). However, because it is used with PA, PI, PU, PE, PO 
and -not- HA, HI, HU, HE, HO, it would seem inappropriate to encode the 
combinations with dot below as nondecomposable characters.
>> http://www.geocities.jp/itikun01/hibi/zat2.html
>>
>> At the above site is evidence of KATANAKA LETTER YI, KATAKANA LETTER
>> YE, KATAKANA LETTER WU, HIRAGANA LETTER YI, and HIRAGANA LETTER YE.
>> They apparently were introduced in the Meiji era but never entered
>> common usage. However, I have not been able to find instances of any of
>> these five characters in use.
>
>The table from "中學教程/日本文典" is good, but a proposal would need a little
>more evidence of their use.
I'm well aware of this and am still trying to find proper evidence for 
encoding.
>> If any of these characters are indeed potential additions to Unicode, I
>> propose making a new Katakana Extended-A block at U+AAE0..U+AAFF.
>
>Best to keep them in the same region as the other kana and bopomofo
>blocks, etc. 2FE0..2FEF is free if sixteen characters are sufficient. 
I considered this at first, but then chose AAE0..AAFF for two reasons:
1. there do appear to be more than sixteen characters to encode, and
2. http://www.unicode.org/alloc/CurrentAllocation.html describes 
2FE0..2FEF as being in the "Symbols Area" rather than the "General 
Scripts Area."
This archive was generated by hypermail 2.1.5 : Mon Dec 03 2007 - 13:37:57 CST