[Corpora-List] Arabic-English Probabilistic Lex

amin farajian ma.farajian at gmail.com
Sat Dec 29 19:36:32 UTC 2012


Dear Mostafa,

We were working on Persian-Arabic SMT some month ago. One of our ideas was
using Arabic-English SMT as a bridge. We found these 2 free parallel
Arabic-English corpora:
1. MEDAR Evaluation Package. as i remember, it is a parallel corpus
extracted automatically from parallel UN Documents. Since the sentence
alignment was automatic, you can find some noise in it. but it is still
usable. by the way, it depends on your goal and the level of accuracy that
you want. you can find this corpus here:
http://catalog.elra.info/product_info.php?products_id=1166, it is free for
both academic and commercial usages. and if you send them your request
form, they will provide you immediately. you can also send an email
directly to Mr. Khalid Choukri (choukri at elda.org). he can help you in
getting this corpus and the other corpora they might have.

2. OpenSubtitles. you can find it here:
http://opus.lingfil.uu.se/OpenSubtitles2011.php. it is also aligned
automatically.

They are some other parallel English-Arabic corpora (such as Xinhua), but
they are not free and I think it would be a bit hard for you to buy them
from Iran.
>>From my last discussion with Dr. Behrang Mohit, I found that he is also
working on English to Arabic SMT. So I think it is worth to send him an
email and talk to him directly.

Hope this was useful for you.

Best regards,
Amin


On Sat, Dec 29, 2012 at 4:47 PM, Mostafa Dehghani <
dehghani.mostafa at gmail.com> wrote:

> Dear Corpus members,
>
> I am looking for a probabilistic Arabic-English (also English-Arabic)
> dictionary that is extracted from parallel or comparable corpus. I cannot
> find any  resources for this kind of dictionary. Although it is possible to
> use some tools to extract translations with their associated  probabilities
> from parallel corpus, it seems there is no free Arabic-English parallel
> corpus available.
>
> I really appreciate any help you can give me and look forward to your
> responses.
>
> Sincerely,
>
> --Mostafa
> --
> Mostafa Dehghani, M.Sc. Student
> Intelligent Information Systems  lab, Software Eng. Group,
> School of Electrical and Computer Eng.(ECE)
> University of Tehran,
> Tehran, Iran
> Tel: +9821-6111-9723
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20121229/a9d276b4/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list