From: Peter Constable (petercon@microsoft.com)
Date: Mon Mar 20 2006 - 09:06:50 CST
> From: Philippe Verdy [mailto:verdy_p@wanadoo.fr]
> >Your statement about Oriya and only "national digit characters" is simply
> >incorrect, as evidenced by the attached screenshot.
>
> It is correct. Only Oriya digits are assigned to Unicode code point
> poisitions. Other glyphs are not accessible from these codepoints (and
> FireFox then fails to access to any of those glyphs). See attached
> "charmap" screenshot.
Ah, interesting! You've spotted an issue in Charmap. It knows what characters are supported in AUMS, but the glyphs are being displayed from another font. That repros on my machine, but AUMS very definitely has glyphs for all the Oriya characters and has them mapped in the cmap from the appropriate codepoints. -- this I have confirmed.
A good clue about this bug is that the .Notdef glyph that Charmap is showing is not from AUMS, but rather from Microsoft Sans Serif. A further clue is that the Bengali glyphs shown are scaled, which implies that they also are not coming from AUMS; the glyphs shown are in fact coming from the Vrinda font. And the Telugu glyphs shown are not from AUMS but rather from the Gautami font. Similarly, the Malayalam glyphs shown are not from AUMS but rather from the Kartika font.
Here's what's happening: Charmap is reading the cmap table for AUMS to determine what characters to display in its palette. But then it proceeds to call a text-drawing API in a standard way that would be needed to draw running text in each script rather than an alternate way that avoids font fallback and ensures that each character from the font is displayed. The standard call passes through Uniscribe which then determines for each script whether OpenType Layout support is required. Of course, for all of the Indic scripts, this is required. Uniscribe then checks which scripts the font has OpenType Layout tables for: AUMS has OpenType Layout tables for Devanagari, Gujarati, Gurmukhi, Tamil and Kannada, but not for Bengali, Oriya, Telugu or Malayalam. Since running text will not display correctly for the latter four scripts without OpenType layout tables, Uniscribe selects fallback fonts for these. For Bengali, Telugu and Malayalam, which are supported in XP SP2, it falls back to the Vrinda, Gautami and Kartik
a fonts. For Oriya, which is *not* supported in XP SP2, it falls back to its default fallback font, which is Microsoft Sans Serif. Since that font doesn't support Oriya characters, the .Notdef glyph from that font is displayed.
The digit characters for Indic scripts do not require OpenType layout support, and are in fact handled in a separate shaping engine. Thus, font fallback does not happen for them. So, Charmap will display the actual glyphs from AUMS -- no font fallback occurs for any of these. This with the above will give you the (false) impression that AUMS supports the digits but not the rest of the script.
So, you see, the facts as I have reported them are true, and the symptoms you see have an explanation.
> May be it's a bug in the "charmap" tool, but the fact that no glyph can be
> found for Oriya letters means that the font has a compatibility problem if
> it works with Viewglyph.
The font does not have any such issues. Viewglyph gives control over how its chart window is displayed, and I have it set such that no font fallback will be used. If I change it, then indeed it will show the same .Notdef glyphs for Oriya characters as Charmap.
> Your screenshot seems to display glyphs in a font, not the codepoints to
> whichthey are assignedinternally.
The screen shot I sent shows ViewGlyph set to display glyphs for specific characters, as mapped in the cmap. As I had it configured for that screen shot, it is calling ExtTextOut with the ETO_IGNORELANGUAGE flag, which inhibits passing through Uniscribe and therefore inhibits any font fallback or shaping. (ViewGlyph's chart window is a mess for complex scripts if you don't have it set to "language processing disabled".)
> >> In other words, no font should mix simple scripts and complex scripts,
> >> unless complex scripts are fully supported. Arial Unicode MS (and a few
> >> others like Tahoma regarding Arabic and Hebrew scripts) is such a
> >> defective font
> >
> >Tahoma supports Hebrew and Arabic - it is not a "defective font".
>
> Supports may be, but not completely.
I find it frustrating that you make these blanket statements about things not working without explaining exactly what is supposedly not working. If you mean that Tahoma in XP doesn't support all of the Arabic additions in Unicode 4.1, then that is true. Unicode 4.1 was published in March 2005, and XP SP2 shipped in August 2004, so that would be expected. If the symptom you had in mind is something different, then please explain.
> I have the same conclusion: we don't run the same Windows.
We run the same Windows; we just interpret symptoms differently. As far as Windows (not IE) is concerned -- GDI, Uniscribe and the fonts that ship with Windows (and also AUMS) -- I think I know better how to interpret the symptoms. I have not worked on IE, so I will refrain from commenting on how it does or doesn't work, other than to say that it does have its own logic about displaying text and may do things differently than (say) Notepad.
Peter Constable
This archive was generated by hypermail 2.1.5 : Mon Mar 20 2006 - 09:12:48 CST