Re: Representative glyphs for combining kannada signs

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sat Mar 18 2006 - 20:38:16 CST

  • Next message: Richard Wordingham: "Re: Representative glyphs for combining kannada signs"

    ----- Original Message -----
    From: "Philippe Verdy" <verdy_p@wanadoo.fr>
    To: "Peter Constable" <petercon@microsoft.com>; <unicode@unicode.org>
    Sent: Sunday, March 19, 2006 12:56 AM
    Subject: Re: Representative glyphs for combining kannada signs

    > From: "Peter Constable" <petercon@microsoft.com>
    >>> From: Philippe Verdy [mailto:verdy_p@wanadoo.fr]
    >>> Here are some screenshots for Bengali, Oriya and Telugu...
    >>
    >> The text in your screenshots looks to me like it might be Arial Unicode MS. As I explained earlier, this font does not provide shaping support for all scripts. That is a font issue, not an issue with IE.
    >
    > "I" am not using it. The page does not specify it and not even its CSS stylesheets. IE just decides itself to this font instead of the font that I have properly set up for these Bengali, Oriya, Telugu and Malayalam scripts. For example In IE, the "Kartika" or "Akshar Unicode" fonts are manually preselected for Malayalam instead of the default. As well, for Bengali, the "Rupali" or "Solaiman Lipi" or "Vrinda" fonts are preselected instead of the default.

    In fact the problem is even more complex: if the user has selected "Arial Unicode MS" for the font to use with the Latin script, than this setting overrides the separate font setting for Bengali, Oriya, Malayalam and Telugu, only because the page starts with Latin text, and the current font pretends supporting the alternate scripts, so no font switching occurs when there's a script change (this font switching only occurs if the current font lacks glyphs for the letters used in an anonymous text element.)

    Apparently, before looking up for the correct font to use to render a script, IE performs a fast check on the text to see if the font used in the parent element is "usable" to render the anonymous text element. It only checks for the script types claimed by this current font. Unfortunately, "Arial Unicode MS" pretends supporting Bengali, Oriya, Telugu and Malayalam (for Oriya, it just contains the national digit characters, and for the 3 others it has enough glyphs but lacks the OpenType layout and shaping tables).

    This does not only affect Indic scripts, but other "complex" scripts as well that need a complex text renderer like Uniscribe (Arabic, Hebrew) for which no font switching occurs to select the complex font specifiedin a stylesheet, or in the user's setting of the Internet control panel.

    The only way to get the correct behavior for complex asian scripts is to make sure that the asiantext is not preceded by a text element containingcharacters rendered with a font that has partialor defective support for these scripts.

    In other words, no font should mix simple scripts and complex scripts, unless complex scripts are fully supported. Arial Unicode MS (and a few others like Tahoma regarding Arabic and Hebrew scripts) is such a defective font, and IE makes bad assumptions about their support of complex scripts using a "fast check" based only on the presence of some codepoints or even faster using the script support flags bitmap in the general font header.

    Firefox succeeds to render those complex scripts (including when ArialUnicode MS is chosen in its user settings for the Latin script) because it does not perform such fast check (or its check is more accurate and does not use simple assumptions about the declared font properties) and so it accurately selects the fonts needed according to the script property of each base character present in a text element.

    For users, this means that it's best to always avoid Arial Unicode MS, even for Latin, and try to use another Latin font with complete support for Latin, but without any Indic character (sothat it won't need to implement the Indic layout tables).

    The decision to mix simple and complex scripts in the same huge font was a font design error, IMHO, if the included complex scripts are not fully supported when building the full font.

    And there still remains a bug in IE, within its "fast check" algorithm for determining if font switching must occur. The bug occurs even if the anonymous text element ONLY contains characters from a single complex script. And it exists as well in Office, but in a different way. It seems that in both cases, it was an optimization to see if the complex Uniscribe rendering engine is needed (and a new text rendering context needs to be set up) to render some embedded text (it tries to reuse the same current font rendering context for the sub-elements.)

    Such bug does not occur in Notepad, only because it uses a single font context for the whole document, and it uses font lookup only for fallbacks of base characters without glyphs in the current font. So there's no "fast check" performed prior to render the text. However the Notepad implementation, if it is accurate, is very slow to render in the case of many fallback with the selected font.



    This archive was generated by hypermail 2.1.5 : Sat Mar 18 2006 - 20:40:22 CST