We're working on finishing up a Yi font, and there's something
I'm uncertain about since I have no background whatsoever in
dealing with East Asian computing issues. I figure the answer
should be the same as if Han characters are involved (so I
figure that many of you will have ideas, even if you're not
that familiar with Yi).
I've got a couple of books in my lap that contain Yi text, and
there is a small assortment of Latin punctuation characters
that are used: , ( ) ! : ; as well as em dash and open/close
quotation marks. There is also an issue with the space
character. The problem is this: when used with Yi characters,
these need to be displayed using wide glyphs, i.e. same advance
width as the Yi glyphs. However, the users want to be able to
also create (proportional width) Latin text with the same font,
e.g. if writing in English about Yi words or for viewing
marked-up text in a one-font-for-all text editor. So, our font
needs to contain both wide and narrow glyphs for these
characters.
The question is: What is the ideal solution? What character
codes get used in text, and how do they get mapped to the
appropriate glyphs?
The quotation marks aren't a problem, I think, since the CJK
Symbols and Punctuation block contains U+301D and U+301E, which
can be used for the wide quotation marks - these aren't
compatibility characters; this will be in line with the use of
U+3001 and U+3002, which also get used with Yi. Also, the em
dash isn't a problem since, by definition, it is full width.
For space and the other punctuation characters, Unicode
contains U+3000, U+FF01, U+FF08, etc., so I could simply access
the wide glyphs from these values in the cmap. But these are
all compatibility characters, so it seems that the preferred
encoding of text will use only U+0020, U+0021, U+0028 etc.
regardless of the desired width for the glyphs.
It seems the solution must be one of the following:
1. Include both wide and narrow glyphs in a single font and
encode text using U+3000 etc. (i.e. encode using compatibility
characters).
This seems to have a problem in that, if text is normalised,
that can affect the layout/presentation of a document, and it
seems to me that shouldn't happen. (Am I wrong to think that's
a problem?)
2. Do not mix wide and narrow glyphs in a font, and encode text
without any compatibility characters. Get the desired width for
glyphs by formatting text using appropriate fonts.
Seems like a valid solution, but unnecessarily limiting if it's
the only option.
3. Mix wide and narrow glyphs in a font, and encode text
without any compatibility characters. Design applications such
that a character such as U+0020 is encountered within a run
that is tagged for an East Asian language (or within a run of
unambiguous wide characters), transduce this to U+3000 (or
whatever the full-width compatibility character is for the
character in question) before calling TextOut, or do something
similar (perhaps handled in the OS) so that the wide glyph is
accessed.
This seems like it would be open to problems; e.g. what happens
if the selected font has only wide glyphs accessed from the
cmap via non-compatibility Unicode values such as U+0020? This
just doesn't seem like something we want software developers to
mess with.
4. Encode text without any compatibility characters, permit
mixing wide and narrow glyphs in a font and use smart font
technology. The font developer can choose to include a
feature-selected substitution - if this feature is on,
substitute this default narrow glyph with this other wide
glyph. The feature might be the language of the given run (need
a LangID for Yi, not currently supported in Win32 or in ISO
639!), which is supported by OpenType (but, I believe, not
Uniscribe; also not currently supported in AAT/ATSUI), or some
other arbitrary label (supported by AAT/ATSUI). Of course, the
run of text must be tagged for this feature.
I think this is the solution we want to be working toward.
Am I right in thinking that option 4 is ideally the preferred
choice? How is this issue currently being handled for other
East Asian writing systems? What solutions are developers
working toward?
If option 4 is the goal but can't yet be implemented, is option
1 (as well as option 2) a reasonable solution for the interim?
Peter
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:56 EDT