From: Benjamin M Scarborough (benjamin.scarborough@student.utdallas.edu)
Date: Mon Dec 03 2007 - 13:35:48 CST
First of all, I would like to say that I just realized that there was a
second volume of the dictionary, available at
http://jdlib.ntl.gov.tw/cgi-bin/browse.cgi?bookid=bjn00172v02
Andrew West wrote:
>[...]
>It may be that some of the tone marks can use existing combining
>characters, but it looks like at least some of them need encoding.
I have tried to find suitable combining marks already in Unicode, but
nothing seems to represent these tone marks appropriately.
>[...]
>> Lastly, there are two combining marks visible on individual characters:
>> a combining line above and a combining dot below. These could be
>> unified with U+0305 COMBINING OVERLINE and U+0323 COMBINING DOT BELOW
>> respectively.
>
>We need to be sure of exactly which letters these marks modify. If
>they only modify certain letters it may be simpler to encode the
>modified letter as a single non-decomposable character (letters with
>diacritic marks do not always need to be decomposable -- cf. Yi and
>Manchu).
The overline mark appears only on U, SMALL U, O, SMALL O, SA, SE, SO,
TI, TU. Instances of all of these can be found in the indices. In many
cases the bar connects with parts of the base character. They are
collated separately from their unmarked counterparts; however, this
dictionary also separates characters with (han)dakuten from their
unmarked counterparts, thus SE < ZE < ZE WITH TOPBAR < SO < ZO < SO
WITH TOPBAR < TA. It is unknown, then, whether this bar is intended to
be an integral part of the character or a combining mark similar to the
dakuten and handakuten. It would be useful to know the meaning of the
mark. It is worth noting, however, that O < O WITH TOPBAR < U < U WITH
TOPBAR < WO < KA at the beginning of a syllable, but N < WO < O WITH
TOPBAR < U WITH TOPBAR at the middle/end.
The dot below mark appears on KA, KI, KU, KE, KO, SA WITH TOPBAR, SE
WITH TOPBAR, SO WITH TOPBAR, TA, TI, TI WITH TOPBAR, TU, TU WITH
TOPBAR, TE, TO, PA, PI, PU, PE, PO. Unlike the dakuten, handakuten, and
topbar, the dot below is ignored for collation; characters with dot
below are freely intermixed with unmarked characters and the
dictionary's headers show both varieties. Furthermore, the dot below
appears to be used with entire columns of consonants (Kx, xx WITH
TOPBAR, Tx, Px). However, because it is used with PA, PI, PU, PE, PO
and -not- HA, HI, HU, HE, HO, it would seem inappropriate to encode the
combinations with dot below as nondecomposable characters.
>> http://www.geocities.jp/itikun01/hibi/zat2.html
>>
>> At the above site is evidence of KATANAKA LETTER YI, KATAKANA LETTER
>> YE, KATAKANA LETTER WU, HIRAGANA LETTER YI, and HIRAGANA LETTER YE.
>> They apparently were introduced in the Meiji era but never entered
>> common usage. However, I have not been able to find instances of any of
>> these five characters in use.
>
>The table from "中學教程/日本文典" is good, but a proposal would need a little
>more evidence of their use.
I'm well aware of this and am still trying to find proper evidence for
encoding.
>> If any of these characters are indeed potential additions to Unicode, I
>> propose making a new Katakana Extended-A block at U+AAE0..U+AAFF.
>
>Best to keep them in the same region as the other kana and bopomofo
>blocks, etc. 2FE0..2FEF is free if sixteen characters are sufficient.
I considered this at first, but then chose AAE0..AAFF for two reasons:
1. there do appear to be more than sixteen characters to encode, and
2. http://www.unicode.org/alloc/CurrentAllocation.html describes
2FE0..2FEF as being in the "Symbols Area" rather than the "General
Scripts Area."
This archive was generated by hypermail 2.1.5 : Mon Dec 03 2007 - 13:37:57 CST