Re: Frequent incorrect guesses by the charset autodetection in IE7

From: Philippe Verdy (
Date: Sat Jul 15 2006 - 07:45:57 CDT

    From: "Sinnathurai Srivas" <>
    > Unicode is not functioning properly, because ISO will not let go the ISO.
    > ISO will not anounce the deprecation of ISO 8859-x. This means Unicode will
    > not get through ISO systems.

    ISO does not have to endorse Unicode (for now), because Unicode is a separate standard body with distinct membership conditions and policies.

    ISO supports since long another project: ISO/IEC 10646 which is now fully binary compatible with the Unicode standard since Unicode version 1.1 (for the character encoding) and since Unicode 4.0 (regarding some standard behavior, notably in UTFs).

    But it may happen in some future that the Unicode consortium stops its activities and some agreements are made so that ISO will manage the other normative properties currently published by the Unicode consortium, by having all works currently made by Unicode's TC and ISO's WG2 joined into a single working group.

    It could happen when most of Unicode work will be for extremely rare scripts or academic scripts, and there's little interest for companies that are members of Unicode to continue to finance alone those projects, as the current languages supported are now stable; in ISO, governements may have cultural interests to continue supporting the project for rare or extinct scripts (some Unicode members are already reluctant to support the cost of development for complex extinct scripts like Egyptian and Mayan hieroglyphs... and academic resources are very limited so they have difficulties to support the cost of their involvement in the standardization process at Unicode.

    But the organizations working in Unicode would like to extend the range of standardization to other I18N and linguistic issues; this is already true as the consortium now strongly supports the CLDR project (for a future standard), and a few other standards (where the consortium acts as candidates for maintaining the registry of other I18N-related ISO standards). The consortium may even promote its projects for adoption as new international standards at ISO.

    Note that it is ISO that finally approves and standardize the extension of the character repertoire (assignment of code points and blocks, character names, and identity) not Unicode itself (Unicode's vote at UTC is consultative at ISO, but Unicode has no direct right to vote at ISO, except if it was chosen by governments to represent their interest in this area, but note that representants at ISO do not represent their company or organization but the country interests, so they cannot always support the organization interests, when there are oppositions by other academic groups or even competing organizations supported by the governement these representants also represent at ISO; their "seat" is temporary, and governments define themselves the rules under which candidate representants can work); Unicode standardizes other properties, and works in collaboration with ISO for the repertoire extension, and both standards are maintained in sync, with simultaneous publication of updates (nearly, all extensions are
    discussed by technical comitees and working groups at both standard bodies, to get an agreement on the content that will be published

    The decision to close the activities of the 7/8 bits encoded charsets is justified; it does not close the works in this area, but does not require an international standard; countries can still develop their own national 7/8 bit encodings if they wish (for example India can continue updating its ISCII standard and map other scripts in it; and nothing forbids Morocco to develop a 8 bit encoding standard for Tamazight), and they can even register it in the IANA charset registry, but it no longer requires updating ISO 8859 (whose last update were for the official national languages of the European Union, and for Celtic languages). Asian scripts are still supported through ISO standards that have not been deprecated.

    National standards, when they are approved for use in a country by one of its official standard body, is still an important posibility; collaboration between countries is still possible to develop a joint standard, but the truth is that all newer 7/8 bit encoding would be supported by OS vendors and programmers only if they have a well-defined mapping to ISO 10646.

    Note that ISO work on 7/8 bits is just stalled: there's no agenda for new works for now in this area, but it does not mean that extensions are impossible in the future, if there are enough mutual agreement to reopen the agenda for some extensions. Closing the schedule was just a pragmatic decision to save costs and invest resources instead in the ISO 10646 standard, and let government alone decide how they will implement it nationally and if they want to support an adaptation of ISO/IEC 10646 (and optionally Unicode) for their national needs.

