Arabic - English translation and technology efforts (fwd)

Harold F. Schiffman haroldfs at ccat.sas.upenn.edu
Mon Dec 2 13:55:15 UTC 2002


It's All Arabic-English to Him
By Joanna Glasner
Story location: http://www.wired.com/news/culture/0,1284,48260,00.html

02:00 AM Nov. 12, 2001 PT

Ask most people what they think of free Internet translation services, and
their first associations are of bizarre sentence structures and amusing
syntactic snafus. But where others see garbled grammar, Fahad Al Sharekh
sees a new era of global communication.  As chief executive of Arabic and
English portal site Ajeeb.com, Al Sharekh believes that the error-prone
technology known as machine translation has played a key part in speeding
the exchange of information between the English-speaking world and the
Middle East.

Four weeks ago, Ajeeb introduced what its founder says is the first free
online service that instantly translates Arabic websites into English. The
company, a division of Arabic-language programming firm Sakhr Software,
has been running an English-to-Arabic translation service for more than a
year.  Al Sharekh, a Kuwaiti citizen educated in the United States, admits
that machine translation -- despite momentous improvements in recent years
-- is still far from perfect. Any arguments to the contrary are quickly
disproved by a glance at the website of Arabic news agency Al Jazeera,
where translations of headlines range from the humorous: "Concord returns
to the service after a year of the stop" to the not entirely intelligible:
"An Israeli incursion is near an embryo and Buch he refuses Arafat
meeting."

But given the voracious demand for news from abroad in the wake of the
Sept. 11 attacks, Al Sharekh tells Wired News that users are learning to
live with a little weird grammar. Wired News: Why did you decide to launch
an English and Arabic translation site?  Fahad Al Sharekh: We realized
there is one impediment for the Internet to be accepted in the
Arabic-speaking world. It is language. The World Wide Web is built with
English domains. Ninety percent of the content on the Web is English.  We
know a lot of people here are educated. They're computer literate. They
have Internet access. But they don't speak English, and that's what's
stopping them from using the Web and the Internet the way they should.

WN: How is it doing?  Al Sharekh: For the month of October, we just hit
something like 14 million requests for English to Arabic translation. So
far, we've had about a million requests to translate Arabic to English.
WN: Which site are you getting the most requests to translate?  Al
Sharekh: For Arabic to English, it's Aljazeera.net, the Arabic satellite
news agency. They're very good, very controversial, and their website is
big. So far about 90 percent of Arabic to English translations are for
this one site.

WN: What English-language sites are Arabic speakers most interested in?
Al Sharekh: Before, CNN was the most popular site. Now people are bored,
because what's going on is almost a routine: "OK, we're bombing
Afghanistan again." Since people got bored, I'm noticing they're going
back to their regular Web browsing, and Yahoo is No. 1. WN: Isn't it
difficult to set up a translation system for two languages that are so
different?  Al Sharekh: For translating Arabic to English, it's a huge
challenge. Arabic's a very old language. It doesn't have vowels. It has no
punctuation. There are no capital letters. The machine translation engine
has to tell from the context what a word means. For example, Taliban in
Arabic is literally "two students," and this was the way it was translated
on our machine translation service initially.  While the search in English
text is simple and there are many tools available for this, the search in
Arabic text is very difficult. There are many forms for Arabic words, with
suffixes, prefixes and root words, and words change completely when used
in different tenses and forms.

WN: With all those complicated grammar issues to consider, how accurate
are the results?  Al Sharekh: There are problems. There are glitches and
bugs. This is software, after all.  We recognize that the aim of machine
translation is to give a good idea about the general meaning of the
material, and human translation quality cannot be reached through machine
translation. It's not accurate. It sounds weird. It has some grammatical
mistakes. Some of the acronyms make no sense. And as the content gets more
and more complicated, the translation is going to be less accurate.
Accuracy requires professional human translation. WN: What are you doing
to make things better?  Al Sharekh: We can't just depend on the machine
translation engine. We had to teach the computer all these names: Taliban,
Osama bin Laden, all the names of Afghan cities and towns. Whoever's on
CNN talking, I want their name entered in English and Arabic.  We also
recently launched Johaina, a news gathering service with an Arabic
language interface. An English site will be fully functional soon. The
service monitors -- around the clock -- hundreds of Arabic and English
websites, detects any new articles and updates and categorizes them. Users
request the full human translation of an article from the site to be
delivered within hours to their e-mail inboxes.

WN: How do you think the events of Sept. 11 affected the way people are
using your site?  Al Sharekh: Here, the masses want their daily fix of
news. They hear what our government has to say, but they also want to hear
what BBC and ABC have to say. WN: What about among English speakers?  Al
Sharekh: Before Sept. 11, with all fairness, people in America didn't want
to know what's going on anywhere else. They can't even spell Afghanistan.
Now, this whole isolationist perspective won't do anymore. A small,
messed-up cult in another part of the world can affect you in your
hometown.  I never thought many people would read Arabic newspapers in
English. However, now I have a whole new crowd of people who are reading
Arabic newspapers and Arabic websites in English.

WN: Are there any sites you would recommend for English speakers looking
for an Arabic news fix?  Al Sharekh: Besides Aljazeera.net, there's Asharq
Al-Awsat from Saudi Arabia, Al Hayat from the United Kingdom and Al-Ahram
from Egypt. WN: What's next for you?  Al Sharekh: We're working on
integrating our speech technologies. For example, people who don't like
using the keyboard can speak Arabic into a machine, have the speech
translated to text, have the text translated to English, and then have it
spoken aloud. That's around the corner.  We're also working on enhancing
the performance of the translation engine to improve the accuracy. I don't
think it's ever going to finish.



More information about the Lgpolicy-list mailing list