<div dir="ltr"><div class="gmail_default" style="font-family:'trebuchet ms',sans-serif"><div class="gmail_default" style="font-size:13px">Hi everybody,</div><div class="gmail_default" style="font-size:13px">"Thank you so much for your useful suggestions",</div><div class="gmail_default" style="font-size:13px">However, the size of the our corpora is almost 20 GB and we have memory problem. Indeed, we have 300K target unique words and 750K alignments and we can not load document-word or word-alignments matrices in the memory. How can I use the tools efficiently? </div><div class="gmail_default" style="font-size:13px">Best,</div><div class="gmail_default" style="font-size:13px">Javid</div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Oct 9, 2014 at 2:34 AM, Reinhard Rapp <span dir="ltr"><<a href="mailto:reinhardrapp@gmx.de" target="_blank">reinhardrapp@gmx.de</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Dear all,<br>

<br>

I would like to point to the work done by Tomas Mikolov, Quoc V. Le, and Ilya Sutskever:<br>

<br>

<a href="http://arxiv.org/abs/1309.4168" target="_blank">http://arxiv.org/abs/1309.4168</a><br>

<br>

It seems that there is code available for this (see footnote 1) of the paper.<br>

<br>

There is also a popular science article on this approach:<br>

<br>

<a href="http://www.technologyreview.com/view/519581/how-google-converted-language-translation-into-a-problem-of-vector-space-mathematics/" target="_blank">http://www.technologyreview.<u></u>com/view/519581/how-google-<u></u>converted-language-<u></u>translation-into-a-problem-of-<u></u>vector-space-mathematics/</a><br>

<br>

Together with Michael Zock I organized a shared task on multi-stimulus association at the COLING 2014 workshop on Cognitive Aspects of the Lexicon (CogALex-IV) and from this I know that systems using Mikolov et al.'s neural network-based language modelling approach perform extremely well in the monolingual case (see e.g. the first 4 papers in the workshop proceedings to be found at <a href="http://aclanthology.info/events/cogalex-2014#W14-47" target="_blank">http://aclanthology.info/<u></u>events/cogalex-2014#W14-47</a>).<br>

<br>

Let me also mention that we (Pierre Zweigenbaum, Serge Sharoff, and myself) are currently serving as guest editors for a special issue of the Journal of Natural Language Engineering (JNLE) on the topic of "Machine Translation Using Comparable Corpora": <a href="http://comparable.limsi.fr/jnle-bucc2015/" target="_blank">http://comparable.limsi.fr/<u></u>jnle-bucc2015/</a> (submissions welcome, deadline Dec. 1, 2014). If you are working in this field, but will not be able to submit a paper yourself, please let us know about your work (especially if it is not already mentioned in the introductory chapter of the volume "Building and Using Comparable Corpora", see Serge's  previous e-mail in this thread) as we are preparing an overview article which aims to be as comprehensive as possible.<br>

<br>

Many thanks and kind regards,<br>

<br>

Reinhard<br>

<br>

-----Ursprüngliche Nachricht----- From: <a href="mailto:inguna.skadina@lumii.lv" target="_blank">inguna.skadina@lumii.lv</a><br>

Sent: Tuesday, October 7, 2014 8:48 AM<br>

To: IngunaSkadiņa<br>

Cc: <a href="mailto:corpora@uib.no" target="_blank">corpora@uib.no</a> ; <a href="mailto:gate-users-request@lists.sourceforge.net" target="_blank">gate-users-request@lists.<u></u>sourceforge.net</a><span class="im HOEnZb"><br>

Subject: Re: [Corpora-List] Bilingual Dictionary from Comparable Corpora<br>

<br></span><div class="HOEnZb"><div class="h5">

Dear Javid,<br>

<br>

<br>

The ACCURAT toolkit (<a href="http://accurat-project.eu/" target="_blank">http://accurat-project.eu/</a>) allows to identify<br>

semi-parallel sentences in comparable corpora and extract<br>

dictionary/translation table from them (with support of GIZA+++).<br>

<br>

I hope, you will find it useful.<br>

<br>

Best wishes,<br>

Inguna Skadiņa<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Citējot javid dadashkarimi <<a href="mailto:javiddadashkarimi@gmail.com" target="_blank">javiddadashkarimi@gmail.com</a>>:<br>

<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hi,<br>

Is there any tool for extracting probabilistic bilingual dictionary for a<br>

bilingual comparable corpora? Does Moses support such a task?<br>

Best,<br>

Javid<br>

<br>

</blockquote>

<br>

<br>

<br>

</blockquote>

<br>

<br>

<br>

<br></div></div><div class="HOEnZb"><div class="h5">

______________________________<u></u>_________________<br>

UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/<u></u>corpora</a><br>

Corpora mailing list<br>

<a href="mailto:Corpora@uib.no" target="_blank">Corpora@uib.no</a><br>

<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/<u></u>listinfo/corpora</a> <br>

<br>

______________________________<u></u>_________________<br>

UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/<u></u>corpora</a><br>

Corpora mailing list<br>

<a href="mailto:Corpora@uib.no" target="_blank">Corpora@uib.no</a><br>

<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/<u></u>listinfo/corpora</a><br>

</div></div></blockquote></div><br></div>