Dataset for all ISO639 code sorted by country/territory?

Richard Wordingham richard.wordingham at
Mon Nov 21 16:36:06 CST 2016

On Mon, 21 Nov 2016 20:08:29 +0100
Philippe Verdy <verdy_p at> wrote:

> 2016-11-21 1:50 GMT+01:00 Richard Wordingham <
> richard.wordingham at>:  

> So the question for the Latin language would be to identify which
> calendar is official, but not how we can bring relevant and accurante
> calendar translation in Latin language for the three calendars. If
> you consider the "La" locale, it should be by default bound to the
> current modern epoch, so using the Gregorian calendar by default. For
> other historic periods, you'd need at least other sublocales, one for
> the Roman Republic, another for the Roman Empire starting at Emperor
> Julius Caesar, bound to the early Julian Calendar, another after
> Emperor Augustus (introducing changes in month lengths to create the
> month of August) bound to the modern Julian Calendar, another for the
> introduction of the Gregorian Calendar: it means 4 distinct locales
> in Latin.

You can qualify a locale by the calendar in use.  I nearly referred to
en_ca_buddhist_GB in my previous post, but then discovered there was a
better way of doing it.  The cycle of days and months is
almost continuous for any region; the problems are to identify the
switchover from Julian to Gregorian in each region, and that is not
peculiar to Latin.  The use of the AD system of dates owes a lot to the
Carolingian Renaissance.

Ideally, we ought to have lots of regnal lists, including lists of
consuls.  In practice, with one exception, I don't think these are
needed for real man-machine interfaces.

> And you'd probably need further distinctions at linguistic
> level for the introduction of lowercase letters in the Middle-Age
> (early Classical Latin was unicameral): 5 distinguished locale
> variants only for this language in the same script !

This could be quite relevant for detecting sentence-boundaries.  Of
course, you also have the interpunct-no spacing-spacing evolution of
the marking of word boundaries, and the disappearance of the apex.
However, modern Classical Latin does use inter-word spaces, and editors
usually do the hard work of determining sentence boundaries.

(I think Unicode would have had a lot of trouble with the
disunification of 'u' and 'v'.)

I'm not sure of the relevance of the appearance of the macron and breve
in teaching materials.  For these, there also seems to be a switch from
the marking of syllable quantity to the marking of vowel quantity.
Perhaps these differences are outside the scope of CLDR, though they're
not irrelevant to spelling and grammar checkers.

> You could as
> well extend this to earlier periods where Latin was still not the
> language of the whole Roman Empire,

It never was, even in the West.

> and had various regional "Italic"
> variants some of them still exhibiting classical Greek features.


More information about the CLDR-Users mailing list