From: Karl Pentzlin (karl-pentzlin@acssoft.de)
Date: Sun Nov 23 2008 - 05:00:23 CST
Am Sonntag, 23. November 2008 um 05:45 schrieb Doug Ewell:
>>> (Karl Pentzlin):
>>> Thus, sequences like U+04E9 U+0304 are NOT appropriate to fulfil the
>>> user's needs, as long as leading operating systems behave like this
>>> more than 10 years after Unicode has decided no longer to accept
>>> precomposed characters.
>>>
>>> Microsoft et al., PLEASE do your homework! Please do it RIGHT NOW!
>
DE> I think Karl may have expected that fonts could be developed in such a
DE> way that combining diacritical marks would be spaced properly above the
DE> base character, ...
That is exactly true, if "properly" simply means "in a way regarding the
formal combining classes, providing a result which can be recognized by the
user".
DE> more or less by magic.
Yes, if "magic" is colloquial for "done by a complex and well-designed
algorithm which possibly is not obvious for everybody at first glance" -
something which computer scientists (like me) do sometimes.
DE> I used to think that would be
DE> possible when I knew nothing about font design, ...
Maybe, but for myself I claim to know at least some of the basics about
font design. I appreciate it as a fine art where not everybody is gifted
to create a Gentium or Andron, but the technical basics are comprehensible.
DE> I still think it would be reasonable to expect combining marks like
DE> macrons and circumflexes to be always centered over the base character,
DE> not off to the right, even if the vertical spacing is wrong.
At least, this. This can be accomplished by an algorithm; a very crude
but working starting point is this: Enclose the base character's glyph by a
rectangle. Determine the center (geometrically; possible refinement:
barycentrally). Get the diacritic glyph from the font itself, of (if not
applicable) from a system default font, and enclose it by a rectangle.
Determine the center (geometrically). Translate the combining class of the
diacritic into a pair of positioning angle and distance, using a fixed table
made once. Place the diacritic rectangle outside of to the base
character, regarding the positioning angle relative to the center points,
and shift it inwards until the distance is accomplished. If another diacritic is to
be added, enclose the combination generated until now by a rectangle
retaining the center point of the original base character, call this the
base character rectangle, and repeat. After finishing, take the final
enclosing rectangle into consideration for line positioning.
A "real working" algorithm like this may need some 100 pages to write down,
but that is what the skilled developers at Microsoft et al. are paid for.
-- Am Sonntag, 23. November 2008 um 04:29 schrieb Peter Constable: PC> How would you suggest anybody do the homework needed to discover PC> that arbitrary & not-well-documented language X uses combining PC> character sequence <Y, Z>? The latter is *explicitly* no precondition for your homework. Your task is: "For European Alphabetic Scripts, implement a solution for any combinations of base characters and combining characters, especially for arbitrary combinations which are *not* explicitly considered in the available rendering system". It shall be noted that, when it was decided in 1996 to encode precomposed characters of European Alphabetic Scripts no longer, this did not affect all diacritics. In fact, it has affected those diacritics which can successfully be handled by an algorithm as outlined above. For all diacritics which need special font-specific treatment, precomposed characters still are encoded after 1996, and have to be encoded if new ones are encountered. Such diacritics are e.g.: - slash overlays (horizontal and diagonal), - other overlays (e.g. middle tilde, double bar), - palatal hooks and retroflex hooks, - descenders. While there seems no official information being available, it seems to be that this decision was made with care, explicitly distinguishing diacritics which can be positioned automatically within reasonable constraints, and such which cannot. This seems to be an (implicit, as now) part of the encoding model for the European Alphabetic Scripts. (If this assumption is correct, I propose to state this explicitly in the next printed version of the Unicode Standard). It differs from the Arabic model (where characters which are considered as precomposed by some are encoded as single units), and it differs from models used for South Asian scripts (where combining marks are encoded separately even if they affect the shape of the base character's glyph considerably). PC> Usage of combining marks with Cyrillic is nowhere near as PC> widespread as it is with Latin. I think Vista does pretty well PC> supporting arbitrary combining sequences for Latin in several PC> fonts, as well as certain known-to-be-used sequences for Cyrillic. At least, there is a significant progress visible in Vista regarding Latin combinations. As doing this for Cyrillic also does not imply any real new mechanism, may I expect the same level of support for Cyrillic in the next SP for Windows Vista? - Karl Pentzlin
This archive was generated by hypermail 2.1.5 : Sun Nov 23 2008 - 05:03:41 CST