From: philip chastney (philip_chastney@yahoo.com)
Date: Sun Nov 23 2008 - 15:01:44 CST
--- On Sun, 23/11/08, Don Osborn <dzo@bisharat.net> wrote:
From: Don Osborn <dzo@bisharat.net>
Subject: RE: Why people still want to encode precomposed letters
To: "'Peter Constable'" <petercon@microsoft.com>, unicode@unicode.org, philip_chastney@yahoo.com
Date: Sunday, 23 November, 2008, 1:01 PM
A couple of quick questions. First, about how long would the list of
combinations be?
if we take 32-ish Latin characters, 24 Greek and 36-ish Cyrillic characters, and double that for upper and lower case, we have 144 potential base characters
Combining Diacritical Marks (0300~036F) lists 112 characters
the number of combinations never yet seen (false positives) will far outweigh the number of combinations requiring more than one mark, or a mark from another block (the false negatives), so our first Wild-Assed Guess (WAG) is a maximum of 16,000 combinations
we can refine that figure
Latin characters use about 40 marks, Greek perhaps half-a-dozen (if we count the cases where 2 marks are used) and Cyrillic about 12
( 32 × 40 ) + ( 24 × 6 ) + ( 32 × 12 ) = 1808 potential combinations per case, which gives us a tighter limit of 3,600 combinations
how useful is that figure? well, a rough count of the preformed composites already defined in 6 Latin blocks is a little over 500, and certainly less than 600
Greek and Coptic, and Greek Extended, contribute another 250
Cyrillic, Supplement, Extended-A and Extended-B, contribute a further 90
call that 900 preformed composites already specified
if Unicode is a job half done, then the total list of known composites could grow to 1800 combinations
add a little for Armenian, Georgian and Yiddish, and we can see the table is unlikely to require more than 2000 entries, with 3,600 entries as a worst case scenario
my guess is that 1,200 would be enough, once Yoruba, Orok, &c, are included, but whatever -- at least we’ve got a Rough Order of Magnitude on the size of the problem
once a table like that becomes available, your average font designer will stick anchors on all possible base characters, and matching anchors on all likely markings, and import the table into his or her font, as an OpenType table
the resultant display may be a little less than perfect, but the reader will see a recognisable mark+base form
your average FontLab user is quite likely to have a similar table already set up, so that preformed composites can be generated automatically -- what your average FontLab user needs now is a table that is complete, so far as present knowledge allows
as to what columns this table should have, I would like to see an H-flag (for Historical) meaning something like “no known usage detected in the wild since the end of the nineteenth century”, or something like that, but that shouldn’t be allowed to cloud the fact that a table of known combinations would be an asset to font designers and users of minority languages, even without any ancillary information
/phil
This archive was generated by hypermail 2.1.5 : Sun Nov 23 2008 - 15:04:22 CST