From: UList@dfa-mail.com
Date: Mon Feb 21 2005 - 06:38:52 CST
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Hello,
I've been pondering the concept of using some kind of "differentiators" to
define sub-meanings for codepoints. I see some discussion in the list archives
of Variation Selectors (which I was thinking of). But it sounds like there are
some problems with using them (or at least so with combining codepoints). I'm
afraid the technical details are beyond me.
So what is the current status with this subject?
Is there actually any problem with using Variation Selectors as-is to
differentiate non-combining characters -- such as these applications:
- Serbian Cyrillic Small "t"
- Coptic letterforms for Greek letter codepoints
- complete Archaic Greek and Asia Minor scripts aligned to Greek letter codepoints
- several functions for two-directional case change with German Sharp S
- alternate CJK ideographs and syllabographs
Also, is it possible to redefine the behavior of Variation Selectors so that
they could be used with combining marks, or create a new class of
"differentiator" codepoints that could be used with combining marks instead?
Some applications for this:
- umlaut vs. diaeresis
- "low acute" vs. "high acute"
- Greek circumflex (perispomeni) in Greece vs. West
- Greek capital letters with subscript (Greece) vs. adscript (West)
- alternate Indic ligatures
I can elaborate on most of these points on request, especially umlaut vs.
diaeresis which you think has been solved with the CGJ but still has vital problems.
In all cases I think it's essential that whatever is done is an official,
mandatory assignment, visually and textually documented in the main glyph
charts. The whole point of this should be that every smart font and keyboard
map in the world reliably implements the system as a standard.
As I say, I don't really understand the technical issues of decomposition and
sorting and so forth, but this seems to be a fairly straightforward concept:
- all differentiators are placed after the thing (letter, mark) they modify,
and are only a characteristic of that thing, containing no information on the
relationship of the thing to anything else
- Unicode can add a behavior definition for a specific assigned combination
of thing + differentiator which all processing systems should implement
- otherwise without a specific behavior definition for the combination, most
processing just ignores a differentiator
- a smart font though may always detect pairs of codepoint + differentiator
and take some action
- also a specialized database can can choose to take a specific codepoint +
differentiator pair into account for sorting, searching or some other purpose
- for precomposed characters, specific precomposed characters +
differentiator combinations can be assigned specific decomposition rules that
define how the decomposed letter + mark(s) + differentiator(s) end up
- or if that's too complicated to implement, precomposed characters can just
be excluded from use with differentiators (we'll all be switching to pure
combining Unicode soon anyway, right?)
Thanks for your education and feedback on this subject.
Doug
This archive was generated by hypermail 2.1.5 : Mon Feb 21 2005 - 13:53:31 CST