[Corpora-List] Bilingual Dictionary from Comparable Corpora

Reinhard Rapp reinhardrapp at gmx.de
Thu Oct 9 09:34:17 UTC 2014


Dear all,

I would like to point to the work done by Tomas Mikolov, Quoc V. Le, and 
Ilya Sutskever:

http://arxiv.org/abs/1309.4168

It seems that there is code available for this (see footnote 1) of the 
paper.

There is also a popular science article on this approach:

http://www.technologyreview.com/view/519581/how-google-converted-language-translation-into-a-problem-of-vector-space-mathematics/

Together with Michael Zock I organized a shared task on multi-stimulus 
association at the COLING 2014 workshop on Cognitive Aspects of the Lexicon 
(CogALex-IV) and from this I know that systems using Mikolov et al.'s neural 
network-based language modelling approach perform extremely well in the 
monolingual case (see e.g. the first 4 papers in the workshop proceedings to 
be found at http://aclanthology.info/events/cogalex-2014#W14-47).

Let me also mention that we (Pierre Zweigenbaum, Serge Sharoff, and myself) 
are currently serving as guest editors for a special issue of the Journal of 
Natural Language Engineering (JNLE) on the topic of "Machine Translation 
Using Comparable Corpora": http://comparable.limsi.fr/jnle-bucc2015/ 
(submissions welcome, deadline Dec. 1, 2014). If you are working in this 
field, but will not be able to submit a paper yourself, please let us know 
about your work (especially if it is not already mentioned in the 
introductory chapter of the volume "Building and Using Comparable Corpora", 
see Serge's  previous e-mail in this thread) as we are preparing an overview 
article which aims to be as comprehensive as possible.

Many thanks and kind regards,

Reinhard

-----Ursprüngliche Nachricht----- 
From: inguna.skadina at lumii.lv
Sent: Tuesday, October 7, 2014 8:48 AM
To: IngunaSkadiņa
Cc: corpora at uib.no ; gate-users-request at lists.sourceforge.net
Subject: Re: [Corpora-List] Bilingual Dictionary from Comparable Corpora

Dear Javid,


The ACCURAT toolkit (http://accurat-project.eu/) allows to identify
semi-parallel sentences in comparable corpora and extract
dictionary/translation table from them (with support of GIZA+++).

I hope, you will find it useful.

Best wishes,
Inguna Skadiņa

> Citējot javid dadashkarimi <javiddadashkarimi at gmail.com>:
>
>> Hi,
>> Is there any tool for extracting probabilistic bilingual dictionary for a
>> bilingual comparable corpora? Does Moses support such a task?
>> Best,
>> Javid
>>
>
>
>




_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora 


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list