Re: metric for block coverage

From: Norbert Lindenberg via Unicode <unicode_at_unicode.org>
Date: Fri, 23 Feb 2018 10:15:32 -0800

> On Feb 18, 2018, at 3:26 , Khaled Hosny via Unicode <unicode_at_unicode.org> wrote:
>
> On Sun, Feb 18, 2018 at 02:14:46AM -0800, James Kass via Unicode wrote:
>> Adam Borowski wrote,
>>
>>> I'm looking for a way to determine a font's coverage of available scripts.
>>> It's probably reasonable to do this per Unicode block. Also, it's a safe
>>> assumption that a font which doesn't know a codepoint can do no complex
>>> shaping of such a glyph, thus looking at just codepoints should be adequate
>>> for our purposes.
>>
>> You probably already know that basic script coverage information is
>> stored internally in OpenType fonts in the OS/2 table.
>>
>> https://docs.microsoft.com/en-us/typography/opentype/spec/os2
>>
>> Parsing the bits in the "ulUnicodeRange..." entries may be the
>> simplest way to get basic script coverage info.
>
> Though this might not be very reliable since OpenType does not have a
> definition of what it means for a Unicode block to be supported; some
> font authoring tools use a percentage, others use the presence of any
> characters in the range, and fonts might even provide incorrect data for
> any reason.
>
> However, I don’t think script or block coverage is that useful, what
> users are usually interested in is the language coverage.
>
> Regards,
> Khaled

All true. In addition, ulUnicodeRange ran out of bits around Unicode 5.1, so scripts/blocks added to Unicode after that, such as Javanese, Tangut, or Adlam, cannot be represented.

Norbert
Received on Fri Feb 23 2018 - 12:17:13 CST

This archive was generated by hypermail 2.2.0 : Fri Feb 23 2018 - 12:17:14 CST