RE: Questions re ISO-639-1,2,3

From: Donald Z. Osborn (dzo@bisharat.net)
Date: Mon Aug 22 2005 - 02:00:59 CDT

Next message: Erkki Kolehmainen: "Re: Chukchee CYRILLIC EL WITH HOOK?"

Previous message: Philippe Verdy: "Re: Historical Cyrillic in Unicode"
Maybe in reply to: Donald Z. Osborn: "RE: Questions re ISO-639-1,2,3"
Next in thread: Peter Constable: "RE: Questions re ISO-639-1,2,3"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Hi Peter, In answer to your question, I can't speak for the person (and group)
that is interested in presenting the codes but I imagine that:

1) It is seen as convenient to have a one-stop site for various information
relevant to localization. (For my part, when assembling information on a
language-by-language basis for African language localizers, I thought it useful
to put relevant ISO-639 codes on the various pages - here I tried to "block and
copy" to minimize the potential for typos and gave the reference sites. This is
a little different ans a "raw date feed" probably wouldn't be helpful in this
instance, but I mention it as an example of a situation where one would want
codes on one's own site rather than simply a pointer to another site. In this
case, I think that aggregating codes in this particular way may also raise some
productive questions, but that's another matter that I'll broach after
responding to your question.)

2) The presentation on the sites, notably the LOC one, is utilitarian but not
very dynamic (not that I have room to criticise on this point, but just an
observation). To SIL's credit, their presentation offers some different ways to
present the data, and some downloads (which coincidentally might encourage
setting up static lists of ISO-639 codes on other pages), but there are gaps
and there is no search feature for the codes.

3) ISO-639 data fed from the official sites could facilitate devising a kind of
relational database linking it to alternate names for languages and perhaps
groupings of languages.
3a) Say you were looking for the code for Pulaar. You would have to Ctrl-F
search the term but would find nothing in ISO-639-1&2. True a knowledgeable
user would try synonyms, but that puts the burden on the user. Next, let's say
that s/he's pulled up the entire list at the SIL site and searches there - fine
they would come up with an ISO-639-3 code for Pulaar but still be ignorant of
the ISO-639-1&2 codes for "Fulah/Peul" that might actually serve the purpose
intended by the user. SIL's site does have a presentation by "macrolanguages,"
but you have to know to look for it. (One might add more to the macrolanguage
list - or better yet provide accurate raw data feed that would facilitate
presenting other configurations/combinations.)
3b) In any event, there must be a lot of examples, but a database set-up
(facilitated by a feed) could provide synonyms and more relevant info. Either
we put upon LOC &/or SIL to set up more databases, or let motivated user
communities do it - the latter is bound to happen to some degree anyway, so why
not devise a way to make sure that what they're using are not copies of static
lists with possible error and inevitable datedness.

I realize that a lot of this is hypothetical and that I've gone a ways out on a
limb with some remarks. So I guess I should go the distance to suggest what
others have probably observed well before (and may already be working on), that
maybe the ISO-639 lists such as they are will need some sort of revisions at
some point with respect to what languages (dialects) are represented at the
"language" and macrolanguage levels, and what the relationship among them is.
The example of Fula/Peul and its variant forms that I mentioned above is an
interesting case in point - the fundamental unity and evident diversity of the
language(s) are such that one could imagine the utility of tagging Pulaar as
ff-fuc - that is Fula-Pulaar, using ISO-639-1 (always the preference over
ISO-629-2 where there are both, as I understand it from the W3C site) and
ISO/DIS-639-3, though such nesting of ISO-639-3 I understand not to be
intended. Further specification by country code would be helpful since the
orthography in Senegal varies slightly from that in neighboring Mali and
perhaps Mauritania.

Anyway, these are clearly not easy decisions and I know that in the interests of
"stability" one can't go about undoing and renaming existing codes. But these
are matters that will likely prompt (provoke?) more discussion as various
users, webmasters, and localizers come into contact with and attempt to use the
standard (lang tagging web content; localization) for languages currently
less-represented in computing and cyberspace. I could go on but time is limited
and this is already steaming off-topic I think.

Thanks for any feedback. (One logical suggestion is that this go to the ISO-639
list - perhaps someone could forward it there and I guess I'll have to
subscribe.)

Don

Don Osborn, Ph.D. dzo@bisharat.net
*Bisharat! A language, technology & development initiative
*Bisharat! Initiative langues - technologie - développement
http://www.bisharat.net

Quoting Peter Constable <petercon@microsoft.com>:

> > From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org]
> On
> > Behalf Of Donald Z. Osborn
>
>
> > A follow up question is whether it would be possible for these
> agencies to
> > provide something like a "raw database feed" (assuming such a complex
> > syndication is possible) that would permit other organizations to
> > incorprate
> > accurate (and automatically updated, though that would not be often)
> > information on their sites with the look and feel of their site.
> >
> > This question arises because someone is looking to post the lists of
> ISO-
> > 639
> > codes on a new site for localization developers, and I don't think the
> > alternative of just providing a pointer to the LOC site is attractive.
>
> Could you explain why a pointer would not be attractive?
>
>
>
> > > <rant>
> > > Several sites have published lists of ISO 639 language identifiers,
> > > rather than simply providing a link to the official site. While this
> is
> > > thought to be helpful, it is extremely unhelpful in that errors get
> > > introduced or the information gets out of date. Anyone that has done
> > > this is strongly advised to delete their private list and replace it
> > > with a pointer to the official site:
> > > http://www.loc.gov/standards/iso639-2/iso639jac.html
> > > </rant>
>
>
> Peter Constable
>
>
>

Next message: Erkki Kolehmainen: "Re: Chukchee CYRILLIC EL WITH HOOK?"
Previous message: Philippe Verdy: "Re: Historical Cyrillic in Unicode"
Maybe in reply to: Donald Z. Osborn: "RE: Questions re ISO-639-1,2,3"
Next in thread: Peter Constable: "RE: Questions re ISO-639-1,2,3"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Aug 22 2005 - 02:02:14 CDT