Dataset for all ISO639 code sorted by country/territory?
richard.wordingham at ntlworld.com
Sun Nov 20 18:50:09 CST 2016
On Sun, 20 Nov 2016 22:53:54 +0100
Philippe Verdy <verdy_p at wanadoo.fr> wrote:
> I think it just requires a minimal dataset: ask for it, submit the
> data, it will be made available for vetting, and if vetting makes it
> suitable for publication with the minimal core set of properties, it
> will be added to the published list.
The minimal data set can be difficult to collect, and may actually be
impossible. There may be technical issues - can one actually specify
that today's date is "a.d. XI Kal. Dec. a.u.c. MMDCCLXIX" in
Classical Latin? It would be good to have a proper line-breaker for
pi_TH, which is Pali written in the Thai script (as opposed to
pi_Khmr_TH and pi_Lana_TH, which are used in old documents) but has
spaces between the words, at least where crasis or similar has not
I once sat down to assemble the minimum data needed for Latin - and
found I was stumped. There just isn't much call for computer user
interfaces in Latin - but support for document preparation in Latin
would be handy.
For some modern languages, some of the concepts may simply not exist -
one would use another language for them. That is probably the real case
for most language names in most languages. Even in the UK, there is a
widespread conception that Pakistani immigrants speak 'Pakistani' in
I would also ask, what is en_TH? Is it the English used in Thailand by
Thais, Britons, Australians or Americans? Currency and year number are
the primary localisation requirements for the last three groups.
Incidentally, most of the native English speakers resident in Thailand
are not officially immigrants - they are present on extensions of stay
granted by non-immigrant visas. For Britons resident in Thailand, the
relevant locale is probably just en-GB-u-rg-thzzzz. That example would
probably go for most immigrant groups.
For that matter, how well defined is es_US?
More information about the CLDR-Users