From: Edward H Trager (ehtrager@umich.edu)
Date: Fri Mar 21 2003 - 12:21:35 EST
OK, Frank,
It took me a little while to remember where to find this kind of
information, but now I've got it!
You need to download IBM's very thorough "International Components for
Unicode" library which is available under an Open Source license at:
http://oss.software.ibm.com/icu/download/2.4/index.html
In the "/source/data/locales" subdirectory of the distribution are a text
files providing locale information for numerous locales. For each locale,
there is a list, among other things, of the names of countries, spelled
out fully, in that language/locale, referenced by the two-letter
abbreviations. For the "ar.txt" file, there is a list of 18 countries,
mainly Middle Eastern. For other languages, such as Thai ("th.txt"), the
list of country names is much more extensive, so I assume that
eventually the Arabic file will get updated too.
The strings are in Java-style, ie: "EG { "\u0645\u0635\u0631" }". What I
did to see them in a more human-readable form was to convert files that I
wanted to look at into utf8 using "uniconv" from the Yudit distribution
(www.yudit.org) and then use yudit to view the files:
%> uniconv -in ar.txt -out ar.utf8 -decode java -encode utf-8
%> yudit ar.utf8 &
(This works on UNIX, Linux, and presumably on Cygwin under Windows too. Of
course, I'm sure there are lots of other ways to view the files too).
Hope this helps! It looks like ICU can serve as a nice data resource in
general, even if you don't plan on using the C++ or Java libraries
directly in software.
On Thu, 20 Mar 2003, Frank da Cruz wrote:
> It would seem timely to augment the collection of native-script
> UTF-8 country names in:
>
> http://www.columbia.edu/kermit/postal.html#index
>
> with more Arabic ones. So far, Arabic is the most under-represented
> script. I have a few (Egypt, Iran, Tajikistan) cribbed from Tex's page
> but would like to fill in Afghanistan, Algeria, Djibouti, Iraq, Jordan,
> Kuwait, Lebanon, Libya, Morocco, Oman, Pakistan, Syria, etc -- any country
> whose name is written in Arabic script. Can anyone help with this?
>
> Thanks!
>
> - Frank
>
This archive was generated by hypermail 2.1.5 : Fri Mar 21 2003 - 13:01:24 EST