Re: [OT] multilingual support in MS products (was Re: Kurdish ghayn)

From: Peter_Constable@sil.org
Date: Mon Apr 28 2003 - 08:59:24 EDT

  • Next message: Sheni R. Meledath: "Arabic text in Unicode hexadecimal code"

    Thomas Milo wrote on 04/27/2003 04:49:26 AM:

    > Would it be possible to make the IJ/ij available at last as a single
    > character IJ/ij for Dutch users?

    If I understand the facts correctly, is this not just a digraph, comparable
    to "ch" in various languages, the only difference being that Unicode
    doesn't have a "ch" character but did include "ij" -- for backward
    compatibility purposes? In other words, ideally "ij" wouldn't have been
    included, but now that we've got it, Dutch "ij" has two alternate
    representations, < i, j > or < U+0133 >.

    Tom, I think what you should be asking Chris Pratley to do is to make the
    spelling checker for Office recognise either spelling; the best way to do
    that is probably to apply a compatibility normalisation to Dutch text.#

    As for input methods, Michael Kaplan has already pointed out that they
    can't really change what has already shipped (and that that is not an
    Office issue). There are ways to create your own input method, though: you
    can use Tavultesoft Keyman now to create your own input method, or soon (I
    presume) Microsoft will be making a tool available.

    #This brings up a general issue worth mentioning: we are familiar with the
    concept of canonical equivalence for Latin precomposed / decomposed
    representations, and the use of Unicode normalisation forms C and D to deal
    with these equivalences. In contrasts, characters with compatibility
    decompositions are quite a sorted lot, and there's no simple, general rule
    to say when compatibility decompositions should or shouldn't be used.

    But, there is one class of Latin characters with compatibility
    decompositions that probably should generally be handled as though they
    were canonically equivent to their decomposed counterparts: digraphs. For
    whatever reason, digraphs as a rule were given *compatibility* rather than
    *canonical* decomposition mappings. But unless I'm missing something, it
    seems to me that for most practical purposes, representations using the
    digraph characters ij, lj, dž etc. should be treated by applications as
    equivalent with their decomposed counterparts.

    - Peter

    ---------------------------------------------------------------------------
    Peter Constable

    Non-Roman Script Initiative, SIL International
    7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
    Tel: +1 972 708 7485



    This archive was generated by hypermail 2.1.5 : Mon Apr 28 2003 - 09:46:25 EDT