From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Feb 09 2004 - 20:16:57 EST
Chris Harvey wrote:
> I think I posted this to the list last week, but I haven't seen it come up.
You may have run up against the size of message constraints
currently imposed on the unicode@unicode.org email list
because of the MyDoom virus.
Some comments on particular issues:
> Misnamed Characters
> The asterisk ᕯ character (U+156F) appears on the code-page chart as
> **, and is named CS TTH. This is a misreading of the syllabarium chart used
> by the French Missionaries for Chipewyan—probably from the 1904 publication
> Prières Catéchisme et Cantiques en langue Montagnaise ou Chipeweyan. The
> chart in this book has been reprinted in most if not all “scripts of the
> world” type books. Unlike most other syllabics charts, this one does not
> have a column of finals to the right of the consonant-vowel syllabics.
> Instead, it simply has a list of all the finals, which do not correspond
> with the syllabics series on the same row. Thus, the CS WEST-CREE P
> (U+144A) (looks like a prime ') final which appears to the right of the
> “tta” row is not the sound “tt”, but is instead “h”. The blue circled
> asterisk is not “tth”, but is in fact a symbol which indicates a proper
> name, in this case /*adą/ (Adam). A second glitch on the Unicode
> code-page chart is that this character is written with two asterisks “**”,
> when in fact on the chart above, the first asterisk is the character
> itself, and the second is part of the example. I believe this should
> definitively be fixed.
I concur with your analysis but not on the fix.
The right answer here is to simply use the existing asterisk,
U+002A ASTERISK for this proper name marking symbol. See, for
example, the summary in Morice 1890:
"* is prefixed to proper names, ... The rest as in Engl."
In other words, the punctuation usage for this script was basically
derived from English typography, using existing symbols. And this
asterisk is more of the same, I believe. It need not be a "letter"
of the syllabary, unlike the forms which actually are representing
final consonants.
However, if such a usage is to be recommended, the right thing to
do is to document the error in the standard and to deprecate
U+156F, since its name, inferred value, and glyph all resulted
from misinterpretation of the primary source.
> In the syllbacs chart mentioned above, the final row in the chart is
> labelled “tca, tce…”, (U+1570-73) which corresponds to the modern Roman
> orthography sound /t/ (an aspirated stop). Interpreting “tca” as “tya” is a
> misunderstanding of the French description of what the c represents. The
> Chipewyan Syllabarium page has more info on this. Whether this syllabics
> series is renamed is probably not a high priority.
>
> In Naskapi, each a-type syllabic character can either be preceded by a
> colon-like character, or have a umlaut-like diacritic. Unicode has labelled
> these as having a long vowel: e.g. (U+1482) CS NASKAPI KWAA. In fact, the
> colon or umlaut does not mark vowel length (Naskapi orthography ignores
> length). Instead, the colon or umlaut simply indicates “wa”. So (U+1482)
> would be better named CS NASKAPI KWA. This is also probably not a high
> priority.
Name changes for standardized characters are not only not a high
priority -- they are forbidden. Any such niceties need to be
handled by annotations or other explanation of what the characters
are actually used for.
> Missing Characters
>
> Naskapi
> According to the Naskapi Lexicon, there is no symbol NASKAPI WOO (U+1416),
> but there is a “wi”. This character look similar to U+140E CS WI, but is
> different—the dot is higher up on the left side. “wi” may need to be added.
> “woo” may be on a different Naskapi chart I have not seen.
I do not concur that this WI is a missing character. There is a range
of glyphs in the sources, varying in the exact placement of the dot.
But U+140E is clearly the character intended for this.
[ Much further analysis omitted, as I don't have time at the moment to
work through them all item by item...]
> In some Dene systems, super script F, V, r, and l are used as finals to
> indicate these sounds from European languages. Carrier Dene also uses a
> regular serif roman “r” for loan words. Should these be encoded in UCAS?
No.
> Are they still technically Roman glyphs?
Yes, they are technically Roman *characters*, displayed with their
normal Roman glyphs.
>
> That’s about all I can think of at the moment, there may be a few other
> issues I have temporarily forgotten. I would appreciate comments and
> suggestions as to how some or all of these ideas can be integrated into the
> Unicode Standard.
The correct way to proceed is to take any feedback you obtain from
discussion on this list, and then prepare a detailed proposal,
suggesting any corrections needed, and submit that to the
Unicode Technical Committee for consideration. See instructions
on the website at:
http://www.unicode.org/pending/proposals.html
If you consider that there are characters *missing* which need to
be added, then in addition to the writeup explaining the issues,
you will need to fill out a Summary Proposal Form for the
character additions.
--Ken
This archive was generated by hypermail 2.1.5 : Mon Feb 09 2004 - 22:14:23 EST