Re: Question about Unicode Ranges in TrueType fonts

From: Andrew C. West (andrewcwest@alumni.princeton.edu)
Date: Thu Jun 26 2003 - 05:50:03 EDT

  • Next message: Jony Rosenne: "RE: Major Defect in Combining Classes of Tibetan Vowels (Hebrew)"

    On Wed, 25 Jun 2003 21:58:28 -0700, "Elisha Berns" wrote:

    > Some weeks back there were a number of postings about software for
    > viewing Unicode Ranges in TrueType fonts and I had a few questions about
    > that. Most viewers listed seemed to only check the Unicode Range bits of
    > the fonts which can be misleading in certain cases.

    For W2K and XP only, Microsoft provides an API for determining exactly which
    Unicode codepoints a font covers.

    GetFontUnicodeRanges() in the Platform SDK fills a GLYPHSET structure with
    Unicode coverage information for the currently selected font in a given device
    context.

    The GLYPHSET structure has these members :

    cGlyphsSupported - Total number of Unicode code points supported in the font
    cRanges - Total number of Unicode ranges in ranges
    ranges - Array of Unicode ranges that are supported in the font

    Note that "cRanges" is not the number of Unicode blocks supported, and "ranges"
    is not an array of Unicode blocks. Rather "ranges" is an array of WCRANGE
    structures that specify contiguous clumps of Unicode codepoints, and "cRanges"
    is the number of contiguous clumps of Unicode codepoints. The WCRANGE structure
    has the following members :

    wcLow - Low Unicode code point in the range of supported Unicode code points
    cGlyphs - Number of supported Unicode code points in this range

    By looping through the "ranges" array it is possible to determine exactly which
    characters in which Unicode blocks a given font covers (as long as your sofware
    has an array of Unicode blocks and their codepoint ranges).

    Note that unlike the Unicode Subfield Bitfield (USB) that is part of the
    FONTSIGNATURE structure that is filled by GetTextCharsetInfo() etc. [available
    to W9X and NT as well as 2K/XP), which is limited to a particular version of
    Unicode (3.0 ?) and returns supersets of Unicode blocks, the GLYPHSET structure
    is version-independant. As long as your software has an up-to-date list of the
    Unicode blocks and their constituent codepoints for the latest version of
    Unicode, you will always be able to get up to date information about Unicode
    coverage of a font.

    This is the method used in my BabelMap utility, and you will note that it is
    therefore able to not only list what Unicode 4.0 blocks are covered by a
    particular font, but also give the exact number of codepoints that are covered
    in that block. If you want to determine language coverage for a particular font,
    then all you need to do is define a minimum set of codepoints that must be
    covered for a particular block or set of blocks to be considered as supporting
    that language. (Just the little matter of deciding what the minimum set of
    codepoints would be for every language that is supported by Unicode ...)

    Now the caveat. The USB sets a Surrogates bit to indicate that the font contains
    at least one codepoint beyond the Basic Multilingual Plane (BMP). Unfortunately
    the "ranges" array of the GLYPHSET structure only lists contiguous clumps of
    Unicode codepoints within the BMP (wcLow is a 16 bit value), and does not list
    surrogate coverage. Therefore you cannot determine supra-BMP codepoint coverage
    from the GLYPHSET structure. If anyone does know an easy way to do this under
    Windows, please let me know.

    Regards,

    Andrew



    This archive was generated by hypermail 2.1.5 : Thu Jun 26 2003 - 06:36:43 EDT