Re: displaying Unicode text (was Re: Transcriptions of "Unicode")

From: Katsuhiko Momoi (momoi@netscape.com)
Date: Thu Dec 07 2000 - 04:38:27 EST


Much of what Erik discussed below is also in the Unicode 17 presentation
of mine called "International Features of New Netscape 6 Browser", a
paper version of which you will find in the proceedings of IUC 17.
(Section 4.4).
The PP presentation (outline only) is available at:

http://home.netscape.com/eng/intl/docs/iuc17/browser/iuc17browser.html

See section 4.4.
- Kat

Erik van der Poel wrote:

> Mark Davis wrote:
>
>> Let's take an example.
>>
>> - The page is UTF-8.
>> - It contains a mixture of German, dingbats and Hindi text.
>> - My locale is de_DE.
>>
>> From your description, it sounds like Modzilla works as follows:
>>
>> - The locale maps (I'm guessing) to 8859-1
>> - 8859 maps to, say Helvetica.
>> - The dingbats and Hindi appear as boxes or question marks.
>>
>> This would be pretty lame, so I hope I misunderstand you!!
>
>
> Sorry, I've been abbreviating quite a bit, so I left out a lot. Yes,
> you've misunderstood me, but only because I abbreviated so much. Sorry.
> Let me try again, with more feeling this time.
>
> Using the example above:
>
> - The locale maps to "x-western" (ja_JP would map to "ja", so I've
> prepended "x-" for the "language groups" that don't exist in RFC 1766)
>
> - x-western and CSS' sans-serif map to Arial
>
> - The dingbats appear as dingbats if they are in Unicode and at least
> one of the dingbat fonts on the system has a Unicode cmap subtable
> (WingDings is a "symbol" font, so it doesn't have such a table), while
> the Hindi might display OK on some Windows systems if they have Hindi
> support (Mozilla itself does not support any Indic languages yet).
>
> We could support the WingDings font if we add an entry for WingDings to
> the following table:
>
> http://lxr.mozilla.org/seamonkey/source/gfx/src/windows/nsFontMetricsWin.cpp#872
>
> We just haven't done that yet.
>
> Basically, Mozilla will look at all the fonts on the system to find one
> that contains a glyph for the current character.
>
> The language group and user locale stuff that I mentioned earlier is
> only one part of the process -- the part that deals with the user's font
> preferences. I'll explain more of the rest of the process:
>
> Mozilla implements CSS2's font matching algorithm:
>
> http://www.w3.org/TR/REC-CSS2/fonts.html#algorithm
>
> This states that *for each character* in the element, the implementation
> is supposed to go down the list of fonts in the font-family property, to
> find a font that exists and that contains a glyph for the current
> character. Mozilla implements this algorithm to the letter, which means
> that fonts are chosen for each character without regard for neighboring
> characters (unlike MSIE). This may actually have been a bad decision,
> since we sometimes end up with text that looks odd due to font changes.
>
> Anyway, Mozilla's algorithm has the following steps:
>
> 1. "User-Defined" font
> 2. CSS font-family property
> 3. CSS generic font (e.g. serif)
> 4. list of all fonts on system
> 5. transliteration
> 6. question mark
>
> You can see these steps in the following pieces of code:
>
> http://lxr.mozilla.org/seamonkey/source/gfx/src/windows/nsFontMetricsWin.cpp#2642
>
> http://lxr.mozilla.org/seamonkey/source/gfx/src/gtk/nsFontMetricsGTK.cpp#3108
>
> 1. "User-Defined" font (FindUserDefinedFont)
>
> We decided to include the User-Defined font functionality in Netscape 6
> again. It is similar to the old Netscape 4.X. Basically, if the user
> selects this encoding from the View menu, then the browser passes the
> bytes through to the font, untouched. This is for charsets that we don't
> already support. This step needs to be the first step, since it
> overrides everything else.
>
> 2. CSS font-family property (FindLocalFont)
>
> If the user hasn't selected User-Defined, we invoke this routine. It
> simply goes down the font-family list to find a font that exists and
> that contains a glyph for the current character. E.g.:
>
> font-family: Arial, "MS Gothic", sans-serif;
>
> 3. CSS generic font (FindGenericFont)
>
> If the above fails, this routine tries to find a font for the CSS
> generic (e.g. sans-serif) that was found in the font-family property, if
> any, otherwise it falls back to the user's default (serif or
> sans-serif). This is where the font preferences come in, so this is
> where we try to determine the language group of the element. I.e. we
> take the LANG attribute of this element or a parent element if any,
> otherwise the language group of the document's charset, if
> non-Unicode-based, otherwise the user's locale's language group.
>
> 4. list of all fonts on system (FindGlobalFont)
>
> If the above fails, this routine goes through all fonts on the system,
> trying to find one that contains a glyph for the current character.
>
> 5. transliteration (FindSubstituteFont)
>
> If we still can't find a font for this character, we try a
> transliteration table. For example, the euro is mapped to the 3 ASCIIs
> "EUR", which is useful on some Unix systems that don't have the euro
> glyph yet. Actually, this transliteration step isn't even implemented on
> Windows yet.
>
> 6. question mark (FindSubstituteFont)
>
> If we can't find a transliteration, we fall back to the last resort --
> the good ol' question mark.
>
> That's it. I hope I didn't abbreviate too much this time!
>
> Erik

-- 
Katsuhiko Momoi
Netscape International Client Products Group
momoi@netscape.com

What is expressed here is my personal opinion and does not reflect official Netscape views.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:17 EDT