ISO 10646-1 Level 2

From: Markus Kuhn (Markus.Kuhn@cl.cam.ac.uk)
Date: Tue Jun 22 1999 - 10:11:08 EDT


I am currently considering whether it would be feasible to extend the
ISO 10646-1 support in xterm from Level 1 to Level 2, because people
have been asking for Thai support especially.

I am curious about the difference between ISO 10646-1 Level 2 and 3.

What is the history and purpose behind Level 2?

ISO 10646-1:1993 doesn't justify its existence in any way and just
specifies it as a subset of allowed combining characters, namely those
needed for scripts such as Thai, Malayam, Kannada, Telugu, Tamil,
Gujarati, etc. that put vowels on top of or below consonant characters.

Is it for ISO 10646-1 Level 2 scripts sufficient, if only one single
combining character is allowed to follow a normal character or is there
some reasonable very small upper limit on the number of combining
characters per base character?

Is it for ISO 10646-1 Level 2 scripts sufficient in fixed-width fonts to
only provide for simple overstriking of combining characters (as opposed
to combining characters that shift position depending on the base
character)?

The idea is the following: Xterm uses a character cell matrix to store
the current screen content of the VT100 terminal that it emulates. It
would be quite feasible to allow two (or even three) UCS values to be
stored per character cell, one normal character and one or more
combining characters. The combining character would just overstrike the
normal character. Fonts for those scripts that use combining characters
in Level 2 (say Thai) would have to be designed such that the combining
characters are placed correctly by simple OR-ing of the base character
and combining character pixel matrices together. It seems to me that
doing this is certainly feasible for scripts such as Thai. Simple
overstriking for combining characters will not be adequate for Latin
characters (due to the height variance between uppercase and lowercase
characters), but Latin/Greek/Cyrillic has to be written in precomposed
characters anyway in Level 2, and X11 BDF fonts do not provide any
per-glyph information in the relative positioning of combining
characters (as for instance TeX fonts do quite nicely).

It might even be feasible to allow more than one combining character, as
long as they do not overlap, for example if there is a mark both above
and below the base character and the order of the combining characters
does not matter, as long as we do not have to reposition combining
characters and as long as the upper limit of combining characters per
character is small (say 2).

Does ISO 10646-1 Level 2 support with simple overstriking and one
combining character per base character sound like a sensible approach to
get support for scripts like Thai into xterm?

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:47 EDT