Re: Question about Unicode Ranges in TrueType fonts

From: Andrew C. West (andrewcwest@alumni.princeton.edu)
Date: Thu Jun 26 2003 - 05:50:03 EDT

Next message: Jony Rosenne: "RE: Major Defect in Combining Classes of Tibetan Vowels (Hebrew)"

Previous message: Tex Texin: "IUC23 Unicode conference exhibitors' panel report"
Maybe in reply to: Elisha Berns: "Question about Unicode Ranges in TrueType fonts"
Next in thread: Philippe Verdy: "Re: Question about Unicode Ranges in TrueType fonts"
Reply: Philippe Verdy: "Re: Question about Unicode Ranges in TrueType fonts"
Reply: Elisha Berns: "RE: Question about Unicode Ranges in TrueType fonts"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On Wed, 25 Jun 2003 21:58:28 -0700, "Elisha Berns" wrote:

> Some weeks back there were a number of postings about software for
> viewing Unicode Ranges in TrueType fonts and I had a few questions about
> that. Most viewers listed seemed to only check the Unicode Range bits of
> the fonts which can be misleading in certain cases.

For W2K and XP only, Microsoft provides an API for determining exactly which
Unicode codepoints a font covers.

GetFontUnicodeRanges() in the Platform SDK fills a GLYPHSET structure with
Unicode coverage information for the currently selected font in a given device
context.

The GLYPHSET structure has these members :

cGlyphsSupported - Total number of Unicode code points supported in the font
cRanges - Total number of Unicode ranges in ranges
ranges - Array of Unicode ranges that are supported in the font

Note that "cRanges" is not the number of Unicode blocks supported, and "ranges"
is not an array of Unicode blocks. Rather "ranges" is an array of WCRANGE
structures that specify contiguous clumps of Unicode codepoints, and "cRanges"
is the number of contiguous clumps of Unicode codepoints. The WCRANGE structure
has the following members :

wcLow - Low Unicode code point in the range of supported Unicode code points
cGlyphs - Number of supported Unicode code points in this range

By looping through the "ranges" array it is possible to determine exactly which
characters in which Unicode blocks a given font covers (as long as your sofware
has an array of Unicode blocks and their codepoint ranges).

Note that unlike the Unicode Subfield Bitfield (USB) that is part of the
FONTSIGNATURE structure that is filled by GetTextCharsetInfo() etc. [available
to W9X and NT as well as 2K/XP), which is limited to a particular version of
Unicode (3.0 ?) and returns supersets of Unicode blocks, the GLYPHSET structure
is version-independant. As long as your software has an up-to-date list of the
Unicode blocks and their constituent codepoints for the latest version of
Unicode, you will always be able to get up to date information about Unicode
coverage of a font.

This is the method used in my BabelMap utility, and you will note that it is
therefore able to not only list what Unicode 4.0 blocks are covered by a
particular font, but also give the exact number of codepoints that are covered
in that block. If you want to determine language coverage for a particular font,
then all you need to do is define a minimum set of codepoints that must be
covered for a particular block or set of blocks to be considered as supporting
that language. (Just the little matter of deciding what the minimum set of
codepoints would be for every language that is supported by Unicode ...)

Now the caveat. The USB sets a Surrogates bit to indicate that the font contains
at least one codepoint beyond the Basic Multilingual Plane (BMP). Unfortunately
the "ranges" array of the GLYPHSET structure only lists contiguous clumps of
Unicode codepoints within the BMP (wcLow is a 16 bit value), and does not list
surrogate coverage. Therefore you cannot determine supra-BMP codepoint coverage
from the GLYPHSET structure. If anyone does know an easy way to do this under
Windows, please let me know.

Regards,

Andrew

Next message: Jony Rosenne: "RE: Major Defect in Combining Classes of Tibetan Vowels (Hebrew)"
Previous message: Tex Texin: "IUC23 Unicode conference exhibitors' panel report"
Maybe in reply to: Elisha Berns: "Question about Unicode Ranges in TrueType fonts"
Next in thread: Philippe Verdy: "Re: Question about Unicode Ranges in TrueType fonts"
Reply: Philippe Verdy: "Re: Question about Unicode Ranges in TrueType fonts"
Reply: Elisha Berns: "RE: Question about Unicode Ranges in TrueType fonts"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Jun 26 2003 - 06:36:43 EDT