From: Jonathan Woodburn (jonathan@woodburn.cc)
Date: Thu Jul 17 2008 - 16:05:59 CDT
> David Starner wrote: First, why are you creating a new font for this
> internal projects that covers Chinese? That can't be economically
> feasible. A quick and dirty font for English is possible, but
> Chinese is a bit larger. [...] you shouldn't need to worry about
> individual Latin-using languages; just toss in MES-2, which will
> cover every major Latin/Greek/Cyrillic using language in the world,
> at the cost of a measly 1000 characters.
Admittedly, Chinese is a huge character set, however, the font is
still aimed at a low memory footprint. However, I'm getting the
impression that perhaps my understanding of Unicode is misinformed (or
simply uninformed). Is every character not found in a common table
for every language (i.e. Latin characters + foreign language accents +
Cyrillic + chinese, etc...)? If so, one font in one format should
suffice the composition of any document in any of the purposed
languages, no?
> Mark Davis wrote: [...] take a look at http://www.unicode.org/cldr/data/charts/by_type/misc.exemplarCharacters.html
> (for Latin a wide screen helps ;-)
Thank you. I've been analyzing the Latin table and as I'm
understanding it, the language codes are at the top of the table in
ISO 639-1 format followed by the character found in that language on
the left, correct? If this is an exhaustive list, it will be a little
tedious to read the HTML Source, but will certainly work. :)
> Stephane Bortzmeyer wrote: You can find a list for the French
> language here (the article is in French but the table - in the RFC
> 4290 format - is in English): http://www.bortzmeyer.org/4290.html
Many thanks. I believe French now reads as follows: (002D ,
0030-0039, 0061-007A, 0153, 00E0, 00E2, 00E6 - 00E9, 00EA, 00EB, 00EE,
00EF, 00F4, 00F9, 00FB, 00FC, 00FF)
> Erkki I. Kolehmainen wrote: MES-2 is part of a CEN Workshop
> Agreement (CWA 13873, IT - Multilingual European Subsets in ISO/IEC
> 10646-1), never meant to become a full blown standard per se,
> available at http://www.cen.eu/cenorm/sectors/sectors/isss/cen+workshop+agreements/multilingual+eur+subsets.asp
> . In the UCS standard ISO/IEC 10646, it is defined as Collection
> 282. IMHO, MES-2 is imperfect and somewhat outdated, but WGL4 is
> much more so.
These standards are simply a smaller collection of code ranges from
Unicode, yes? If not, does that imply a custom multilingual font
which uses select characters for specific languages is not possible?
On an contributory note, I've found this site (http://www.eki.ee/
letter/) which lists what special characters are needed in addition to
the basic latin script to display a given language. This seems to hit
the issue on the head to a great degree, but a couple questions remain
(introduced by the prior feedback):
1. Are all characters for every language found in a single Unicode
definition so that U+XXXX can express any character?
2. Would it be necessary to create individual fonts for particular
(non-coexisting) languages?
I hope my questions don't confuse the issue, as I do appreciate the
feedback.
Cheers,
Jonathan
This archive was generated by hypermail 2.1.5 : Thu Jul 17 2008 - 16:10:17 CDT