Re: Tajik alphabet code

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Mar 01 2004 - 15:30:28 EST

  • Next message: Frank Yung-Fong Tang: "Re: What's in a wchar_t string on unix?"

    > On 01/03/2004 00:18, Asomiddin Atoev wrote:
    >
    > >I am emailing on behalf of the Tajikistani state
    > >working group on localizing software for Tajik
    > >language. Could you please kindly guide us to be in
    > >right direction. What shall be the procedure of
    > >standartization of alphabet symbols? Tajik alphabet
    > >makes use of cyrillic symbols and contains of 35
    > >letters.

    I think that his question is not whever Unicode supports Tajik, if works has
    been done (may be in other countries, for librarian purposes) to define a subset
    appropriate to publish and work with texts in Tajik language. The fact that
    Tajik orthograph has been influenced a lot from the time of USSR and Russian
    domination in this former Republic of the Union, may have influenced the
    language so that some old texts with important cultural backgrounds have lost
    some of their original semantic.

    So there may exist libraries in the world, where there remains texts in original
    orthograph, or adapted from the Cyrillic-based orthograph, which contain more
    letters than those that we commonly see. If there are attempts to reform the
    orthograph to better match the language needs, there may already exist some
    letter variants which would interest him.

    Also, if there are existing sets, this means that this creates an opportunity to
    propose an alternate 8-bit encoding for Tajik, which would be a variant of the
    ISO-8859 Cyrillic encoding used for Russian, except that it would contain all
    letters needed for Tajik.

    Unicode clearly seems to support this language well, but there's still a need to
    have a common framework for working with Tajik texts with an 8-bit encoding
    (which would be better than UTF-8 and as simple and efficient as ISO-8859-1 for
    Western European languages, or ISO-8859-4 for Russian).

    So this question would certainly meet some exports at the ISO Working Group
    working on 8-bit encodings compatible with the ISO-8859 standard (this is
    independant of the fact that this subset will be fully mapped and supported with
    Unicode. Having such a subset will certainly help unifying various sources by
    agreeing on a common orthograph, instead of relying on the support of the large
    Unicode/ISO/IEC 10646 coded set. If such a subset is then approved nationally,
    it will help get a decent support and mapping within many fonts, keyboard
    drivers, and text processing tools.

    After all, ISO-8859-15 was decided and standardized after a similar reform in
    the Euopean Union.that needed some Latin characters not present in ISO-8859-1,
    even if all these characters were already present in Unicode, or adopted
    recently in Unicode (like the Euro codepoint that was created instead of using
    the legacy and non standard ECU symbol with various and non distinctive forms).
    So why not with Tajik too?



    This archive was generated by hypermail 2.1.5 : Mon Mar 01 2004 - 16:13:44 EST