RE: glyph selection for Unicode in browsers

From: Peter_Constable@sil.org
Date: Fri Sep 27 2002 - 09:44:12 EDT

  • Next message: Peter_Constable@sil.org: "Re: glyph selection for Unicode in browsers"

    On 09/26/2002 07:24:08 PM "Murray Sargent" wrote:

    >I don't think the idea is that codepage equals language. Rather codepage
    >equals a writing system, which consists of one or more scripts (e.g., 6
    >scripts for ShiftJIS). As such the codepage is a useful cue in choosing
    >an appropriate font for rendering text.

    (Murray and I talked about this some at dinner a couple of weeks ago, so
    there's some history here.)

    I don't think things are quite that simple. A codepage *can* be a useful
    cue in choosing an appropriate font (or in choosing typographic preferences
    by whatever means). This certainly may be the case in some instances, such
    as Shift JIS. But it's not always the case. For instance, cp1251 doesn't
    tell you what language is involved, and isn't sufficient to determine which
    italic variants of certain Cyrillic characters are needed. Similarly,
    cp1250 doesn't tell you what cultural preferences should apply in relation
    to design and alignment of the ogonek diacritic (e.g. Polish and Lithuanian
    differ in this regard), or other diacritics (e.g. caron should have a
    distinct form for Czech); and cp1252 doesn't tell you about cultural
    preferences regarding cedilla (three different forms can be used for
    French, but only one is acceptable for Portuguese or Catalan).

    That's why I maintain that a codepage is a character set, but not a writing
    system. In general, a codepage does not determine a set of rules for
    writing; it just provides a vocabularly with which to work.

    >The bottom line is that if text was generated using a particular
    >codepage it's likely that the creator of that text intended the text to
    >be rendered with a font that supports that codepage.

    Of course, fonts can support multiple codepages. Given e.g. Arial, Tahoma
    and Verdana, they all support codepages 1250, 1251, 1252, 1253, 1254, 1257
    and 1258. That doesn't tell you whether they're appropriate for Polish or
    Lithuanian or Czech or whatever. Even the fact that they support cp1258
    doesn't imply that they are appropriate for Vietnamese: e.g. the default
    glyphs in Arial for U+1EA5 and U+1EA7 do not have the diacritics stacked in
    the way needed for Vietnamese.

    I'm not saying that codepage information isn't ever useful. Obviously, you
    have found it very useful. But the usefulness has limits.

    - Peter

    ---------------------------------------------------------------------------
    Peter Constable

    Non-Roman Script Initiative, SIL International
    7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
    Tel: +1 972 708 7485
    E-mail: <peter_constable@sil.org>



    This archive was generated by hypermail 2.1.5 : Fri Sep 27 2002 - 10:45:59 EDT