Re: New to Unicode

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Jul 24 2006 - 21:54:22 CDT

  • Next message: Curtis Clark: "Re: Proposal to encode an EXTERNAL LINK symbol in the BMP"

    From: "Peter Constable" <petercon@microsoft.com>
    >I know there are instances in the Wikipedia site in which language IDs have been invented and do conform to ISO 639, but I don't recall any specific cases at present.

    Look into the new incubator project for proposed languages, there are many errors, that I have personnaly commented, because users of those proposals wanted to create their own codes; som of those codes have been used after the initial proposal, with insufficient votes (but well, the code assignment policy was still not documented).

    There are however a few archives of the proposal in the test wikis, that remain in Incubator, and that have been put online; most errors are in Austronesian languages, but a few are in finno-ugrian regional languages as well (in Russia or other Nordic European countries).

    The creation of wikis for regional variants of Italian is also quite controversed (in Incubator you'll fond some more, as well as non standard codes for regional variants of Spanish, some of which are really difficult to distinguish from the Castillan continuum).

    And there are languages that have been codified using the code for a family of languages, and a non standard variant code, instead of using simply the ISO 639-3 draft (which seems to be extremely stable, despite of its status, thanks to the long experience of the maintenance agency and its past long work in the Ethnologue Report).

    In the past, there was also two domains for Traditional and Simplified Chinese (now these projects have been merged, because the script is not relevant for Wikipedia, which wants real language distinctions, such as with Yi and Wu, that should not be considered as script variants of the same Chinese language).

    There remains also a few legacy codes that were valid in some past versions of ISO 639 and that are deprecated now (look at Serbo-Croatian, the most ipmportant case).

    Look also to Simple English (code used = simple, not ISO 639), a controversial creation that could have been hosted by creating specialized navigation in the English Wikipedia.

    May be this last project will be merged later into "en", by using some meta-data extension to Wiki pages in the software and by supporting more user preferences (notably a conform content rating labelling system, which would be better than the existing templates for warning banners, and that would allow selecting relevant parts of articles).

    And may be some day, all the editions will be merged into a single database, with more advanced navigation tools and better handling of user preferences and of multiple article titles.



    This archive was generated by hypermail 2.1.5 : Mon Jul 24 2006 - 21:58:52 CDT