[Corpora-List] machine translation

Mahdi Mohseni mohseni48 at gmail.com
Wed Dec 19 06:59:00 UTC 2012


ِDear Sadegh and Amin,

The project is now finished and it'll be published soon. But I don't know
if it's published completely or partially and when exactly it is published.

Regards,
Mahdi Mohseni

On Tue, Dec 18, 2012 at 8:44 PM, amin farajian <ma.farajian at gmail.com>wrote:

> Dear Mohammad Sadegh,
>
> Good news about SCICT corpus. It took along time, but I hope the resulting
> corpus was fine.
> now I am doing my PhD in FBK-IRST, Italy, so I am not in Iran and I don't
> have access to the people in SCICT. Is there any other way for obtaining
> this corpus? As I know Ms Khamesi is in Bojnourd, Iran. So, if possible,
> please provide her the information that she needs for contacting with SCICT
> people and getting this corpus.
>
> Best regards,
> Amin
>
>
>
> On Tue, Dec 18, 2012 at 5:32 PM, Mohammad Sadegh Rasooli <
> rasooli.ms at gmail.com> wrote:
>
>> Thanks Amin,
>> As I know about SCICT corpus, it is a big corpus of collections of
>> classic novels that the project has been finished in summer. I don't think
>> the corpus is completely available but if you live in Iran I think it's
>> easy to obtain that dataset. I think you should contact with people in
>> charge in SCICT.
>> Best
>>
>> On Tue, Dec 18, 2012 at 11:12 AM, amin farajian <ma.farajian at gmail.com>wrote:
>>
>>>  Dear Karine,
>>> the corpus that you talked about (in Payame Noor University of Yazd) is
>>> actually the one which is available in ELRA. There is also another
>>> parallel corpus entitled PEN, developed by myself. It is not still publicly
>>> available, but I'm going to publish it. In the following paper you can find
>>> some information about it:
>>> Mohammad Amin Farajian (2011). PEN: Parallel English-Persian News Corpus<http://world-comp.org/p2011/ICA4953.pdf>.
>>> Proceedings of 2011 International Conference on Artificial Intelligence
>>> (ICAI'11), Nevada, USA.
>>>
>>> There are some other researchers (Dr. khadivi in Amirkabir University,
>>> Dr. Faili in University of Tehran, Dr. Analoui in Iran University of
>>> Science and Technology) and research centers (ITRC and SCICT) in Iran
>>> which are working on SMT and are building some parallel corpora, but as I
>>> know their corpora are not available yet.
>>>
>>> Best regards,
>>> Amin
>>>
>>> On 12/18/2012 03:33 PM, Megerdoomian, Karine wrote:
>>>
>>>  I haven’t seen any other parallel English-Persian corpora besides the
>>> ones already mentioned below. However, I have heard about a corpus being
>>> developed by the English department at Payame Noor University in Yazd,
>>> Iran. You may want to contact them. Here’s the info online:
>>> http://www.eurac.edu/it/newsevents/focus/Newsdetails.html?entryid=22181*
>>> ***
>>>
>>> ** **
>>>
>>> “Our developmental English-Persian parallel corpus consists of about *three
>>> million words* (more than 50,000 corresponding sentences in two
>>> languages). This is a kind of ongoing corpus, that is, an open corpus in
>>> which more material can be added as the need arises.”****
>>>
>>> ** **
>>>
>>> Karine****
>>>
>>> ** **
>>>
>>> ** **
>>>
>>> *From:* corpora-bounces at uib.no [mailto:corpora-bounces at uib.no<corpora-bounces at uib.no>]
>>> *On Behalf Of *Hieu Hoang
>>> *Sent:* Tuesday, December 18, 2012 7:31 AM
>>> *To:* Khamesi Fahime
>>> *Cc:* corpora at uib.no
>>> *Subject:* Re: [Corpora-List] machine translation****
>>>
>>> ** **
>>>
>>> Hi Khamesi
>>>
>>> According to this website
>>>    http://opus.lingfil.uu.se/
>>> There are 3 freely available parallel corpora for persian-english:
>>>   TEP
>>>   KDE
>>>   OpenSubtitles
>>>
>>> I've noticed other people, especially in Tehran, are also working on MT
>>> and collect data, eg.
>>>   http://ece.ut.ac.ir/iis/resources.html
>>>
>>> Kind Regards
>>> Hieu
>>>
>>> ****
>>>
>>> On 12 December 2012 21:15, Khamesi Fahime <khamesi_fahime at yahoo.com>
>>> wrote:****
>>>
>>> Hi,
>>> I am student of Linguistics in Iran and i am working on English to
>>> Persian statistical machine translation .****
>>>
>>> unfortunately  I haven't found any EN-PER corpus except TEP and ELRA .**
>>> **
>>>
>>> There are many restrictions in Iran(boycott) for ordering ELRA .
>>> I appreciate if u can help me in this respect.****
>>>
>>> I am looking forward to your reply.****
>>>
>>> Best regards,****
>>>
>>> Khamesi****
>>>
>>>
>>> _______________________________________________
>>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>>> Corpora mailing list
>>> Corpora at uib.no
>>> http://mailman.uib.no/listinfo/corpora****
>>>
>>> ** **
>>>
>>>
>>> _______________________________________________
>>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>>> Corpora mailing listCorpora at uib.nohttp://mailman.uib.no/listinfo/corpora
>>>
>>>
>>>
>>> _______________________________________________
>>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>>> Corpora mailing list
>>> Corpora at uib.no
>>> http://mailman.uib.no/listinfo/corpora
>>>
>>>
>>
>>
>> --
>> Mohammad Sadegh Rasooli
>> PhD Student, Computer Science Department, Columbia University
>> Research Assistant, Center for Computational Learning Systems, Columbia
>> University
>>
>
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20121219/5033f8a2/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list