From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Fri Aug 26 2005 - 08:19:02 CDT
From: "Peter Constable" <petercon@microsoft.com>
>From: Philippe Verdy [mailto:verdy_p@wanadoo.fr]
>> I also note that the file contains no entries for other reserved
>> codes:
>> * Scope=R (Reserved),
>
> Again, a completely inappropriate use of Scope. Also, I don't see why
> the data file should include entries for identifiers that has all of
> their properties defined in the standard itself.
>
>
>> On the opposite, I see that the ISO 639-3 database keeps entries for
>> special codes (which seems in opposition with the ISO 639-3 policy
>> of not encoding collective languages, i.e. Scope="C" used for
>> language families):$
>> * Scope=S, for example [mul] and [und] in ISO 639-2 and ISO 639-3;
>
> Here, a special value for Scope would be appropriate. (Thanks for
> bringing this to my attention.)
Please note that I did not invent the special "S" and "R" values used for
the Scope field. They are shown on the SIL.org's web pages, even if they are
absent from the downloadable tab-separated text files:
* One is the list of individual languages and macro languages, all with
their unique ISO 639-3 code, name, scope and living status, and optionaly
and informatively their ISO 639-2/T 3-letter codes or ISO 639-1 2-letter
codes if they exist; it contains only languages (scope="I") and
macrolanguages (scope="M"), and does not list knwon aliases, or regional
dialects.
* The other contains a one-to-many relation table that maps macrolanguages
to to languages.
It is not extremely clear to see the difference of encoding and mapping used
between:
(1) macrolanguages and its isolated languages;
(2) isolated languages and its dialects.
The definition is quite fuzzy: I first wrote about missing regional dialects
of French, where some have been encoded as isolated languages, and some
being considered as dialects of standard French and not encoded; anyway,
Louisiane French, Cajun and Acadian are really dialects of the same American
French language that also contains Canadian French (in Quebec and Ontario,
and that also have their variants and creoles with other native American
languages). This looks like American French and Canadian French should then
be encoded, and that "French" (alone) should be considered a macrolanguage
(at least) or even a collection.
It looks like the distinction of cases comes from the legacy use of the ISO
639-1 [fr] code in locale identifiers, meaning that ISO 639-3 [fra] (ISO
639-2/T [fra] and ISO 639-2/B [fre] and ISO 639-1 [fr]) could not be
considered a collection. But I see that Arabic for example should be
considered the same, but it was encoded as a macrolanguage (Scope=M), with
its variants also encoded as isolated languages (Sope=I). So why doesn't
French map in ISO 639-1 as a macro-language?
This archive was generated by hypermail 2.1.5 : Fri Aug 26 2005 - 08:21:03 CDT