From: Doug Ewell (dewell@adelphia.net)
Date: Wed Oct 23 2002 - 12:02:55 EDT
The WG2 home page was updated today to add a link to document N2507,
"Draft of Proposal to add Latin characters required by Latinized
Taiwanese Holo language to ISO/IEC 10646" [1], by a group called the
Department of Language Education of National Taitung Teachers College.
The document is dated either 2002-03-11 or 2002-03-31, depending on what
part of the title page you look at.
This document proposes a COMBINING RIGHT DOT ABOVE for use in a popular
Latin-script orthography of the Taiwanese Holo language. Some time ago
(I can't look up exactly when because the unicode.org archives are
unavailable), I wrote that this combining character should be added in
lieu of a largish collection of precomposed characters. Ken Whistler
responded that the issue had already been debated, and a solution
already presented to use U+0307 COMBINING DOT ABOVE (possibly
incorporating a Taiwanese font-specific glyph variation to move the dot
to the right).
Evidently the Taiwanese teachers did not consider this satisfactory, as
they have responded with this new proposal to encode a separate
COMBINING RIGHT DOT ABOVE.
Whether this new combining character makes sense, however, the rest of
the proposal clearly does not. The group has proposed no less than 42
precomposed Latin characters, all of which can be formed using existing
Latin letters and combining marks (together with the proposed RIGHT DOT
ABOVE).
The 42 precomposed letters are proposed "to be added to Latin
Extended-B," which is a puzzle to me since that block has only 25
available code positions as of Unicode 4.0.
Much more troubling, however, is the fact that this group has apparently
ignored or disregarded the Unicode/10646 policy against standardizing
new precomposed letters that can be composed with existing characters.
The document says:
"The precomposed characters are proposed to ensure compatibility with
the existing font "HoloWin" in the word-processing software HOTSYS
widely employed in the user community. We have been promised composing
characters in major (Microsoft etc.) implementations since 1997. Now, 5
years later, we still have nothing."
Compatibility with 8-bit legacy fonts and software is *not* sufficient
cause for encoding new precomposed characters. The WG2 "Principles and
Procedures" document [2] specifically states that a precomposed
character should not be encoded "if solely intended to overcome
short-term deficiency of rendering technology." The Taiwanese document
does not say which "major (Microsoft etc.) implementation" fails to
support composition using combining marks, but as a previous thread on
this list has shown, there is at least some support in Internet Explorer
for such characters.
Try this experiment: One of the precomposed characters proposed by the
Taiwanese teachers is LATIN SMALL LETTER N WITH CIRCUMFLEX. Here it is,
encoded properly as U+006E U+0302:
n̂
Some of you will be able to see this character, others will not.
Rendering technology is not perfect yet. But this is the correct way to
create new accented letters in Unicode/10646, not by adding more
precomposed characters.
The proposal for a new COMBINING RIGHT DOT ABOVE may or may not have
merit -- I'm not going to commit firmly to the idea that it does, like I
did last time -- but the 42 precomposed letters have no business being
encoded and should not be debated further.
-Doug Ewell
Fullerton, California
-Doug Ewell
Fullerton, California
[1] http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2507.pdf
[2] http://std.dkuug.dk/JTC1/SC2/WG2/docs/n2352r.pdf
This archive was generated by hypermail 2.1.5 : Wed Oct 23 2002 - 12:55:48 EDT