From: Tom Emerson (tree@basistech.com)
Date: Sat Jan 03 2004 - 19:23:46 EST
Frank Yung-Fong Tang writes:
> The agent probably just heard the name over a tapped phone.
> It probably does not matter who FBI store the name after
> that. It could be an Arabic to French transliteration read by
> some one famliar with Arabic to English transliteration system.
Or it could be a name read by an Egyption or a Libyan or a Saudi ---
all of which will sound different: Gaddhafi vs. Geddafi vs. Qathafi
vs. Kazzafi... dialectical differences between the speaker of the word
can make name matching difficult. Even advanced name matchers like
First Logic's cannot handle all of the Arabic transliterations that
are available. It requires more advanced technology than simple fuzzy
string comparisons or soundex-like algorithms.
You run into these problems outside of names. If you are into Arabic
music and try to find it on the web, you're in for some hard
times... the number of different transliterations are manifest! For
example, some of the songs from Nawal Al Zoghbi's recent album
demonstrate:
Elli Tmannayto === Elli Tmanetoh
7abeeb Dialli === Habib Dialy
Ya 7abeebi Ana === Ya Habibi Ana
Trekni Rou7 === Trikni Rouh
The US Government knows about these issues and is quite willing to
take advantage of them: a recent post on another mailing list I'm on
requested help from Arabists from an immigration lawyer. One of his
clients was going to be deported because their last passport used "el"
while their birth certificate (which contained their name in Arabic
script and Latin script) used "al". The contention was that "el" and
"al" were different words and therefore the documents did not
match. Yet the USG knows that "el" and "al" are merely orthographic
variants of the Arabic definite article alif-lamm and want them
treated the same. <sigh>
> Unicode do not solve "transliteration" issue at all. There are
> multiple Arabic transliteration system available. Even the
> ISO standard Arabic transliation system is not 100% adopted by
> some Arabic speaking country.
I would be surprised if the ISO system is used by anyone but ISO.
> Remember, all the airline still use ASCII only for name these
> day on our borading pass. The problem could be in the airline
> side instead of the FBI side.
FBI, CIA, NSA, DIA, Immigration, LoC... they're all having problems
with this. And the agencies rarely share information, and only now are
they starting to define a common (though non-reversable)
transliteration scheme: each agency has their own, and often different
parts of the _same_ agency will have their own. And this is true for
other languages (esp. Farsi and Pashto) as well.
-tree
-- Tom Emerson Basis Technology Corp. Software Architect http://www.basistech.com "Beware the lollipop of mediocrity: lick it once and you suck forever"
This archive was generated by hypermail 2.1.5 : Sat Jan 03 2004 - 19:59:25 EST