[Corpora-List] Bilingual Dictionary from Comparable Corpora

Krishnamurthy, Ramesh r.krishnamurthy at aston.ac.uk
Mon Oct 6 13:26:42 UTC 2014


Hi Javid
yes, i am familiar with parallel corpora and comparable corpora. :)
...but for me, a 'dictionary' means something very different to 'an aligning tool
for comparable corpora'.... :)
best
ramesh
________________________________
From: javid dadashkarimi [javiddadashkarimi at gmail.com]
Sent: 06 October 2014 10:09
To: Krishnamurthy, Ramesh
Cc: Jörg Tiedemann; corpora at uib.no
Subject: Re: [Corpora-List] Bilingual Dictionary from Comparable Corpora

Hi Ramesh,
​Excuse me If I did not explain carefully,​
In Statistical Machine Translation of Cross-lingual Information Retrieval (CLIR), parallel corpora(sentence-aligned corpora) and comparable corpora (document -aligned corpora that documents are not as precisely translations of each other as the parallel corpora but they are in the same topic) are useful resources to translate queries in different languages from documents. Indeed, these tasks extract some words in target language that are translations of a source language word with different probabilities. So we have a comparable corpora that each document in the source language
​is
in the same topic that some other in-the-target-language documents
​​
(
​
​D0s​
 → Dt1, Dt2, ..Dtk​
)
​ ​
,
(
​
​D
​1
s​
 → D
​'​
t1, D
​'​
t2, ..D
​'​
tk​
)
​ , .. ,
​
(
​
​D
​m
s​
 → D
​"​
t1, D
​"​
t2, ..D
​"​
tk​
)
​
.
​Best,
Javid​


On Mon, Oct 6, 2014 at 1:44 AM, Krishnamurthy, Ramesh <r.krishnamurthy at aston.ac.uk<mailto:r.krishnamurthy at aston.ac.uk>> wrote:
hi javid

i think you and i have different ideas about what a 'dictionary' is. :)

i think perhaps you just want to find 'word/phrase-equivalents' in comparable corpora in
different languages?

i don't know enough about computational linguistics, but i *suspect*
that both SketchEngine and Tshwanelex are for 'fuller' dictionaries,
eg with collocational, grammatical, semantic, phraseological info, etc
for each entry.... but they can probably be used with a bilingual lookup
(eg Wordnet) to  link items in the comparable corpora...?

best
ramesh



________________________________
From: Jörg Tiedemann [Jorg.Tiedemann at lingfil.uu.se<mailto:Jorg.Tiedemann at lingfil.uu.se>]
Sent: 06 October 2014 09:02
To: javid dadashkarimi
Cc: Krishnamurthy, Ramesh; corpora at uib.no<mailto:corpora at uib.no>
Subject: Re: [Corpora-List] Bilingual Dictionary from Comparable Corpora


Maybe you want to have a look at alignment tools for comparable corpora such as:
- http://www.accurat-project.eu
- http://yalign.machinalis.com

I haven't used these tools myself but I would be interested to hear if they work for you.

Good luck!
Jörg

**********************************************************************************
 Jörg Tiedemann                                   jorg.tiedemann at lingfil.uu.se<mailto:jorg.tiedemann at lingfil.uu.se><mailto:jorg.tiedemann at lingfil.uu.se<mailto:jorg.tiedemann at lingfil.uu.se>>
 Dep. of Linguistics and Philology           http://stp.lingfil.uu.se/~joerg/
 Uppsala University                                  tel:  +46 (0)18 - 471 1412
 Box 635, SE-751 26 Uppsala/SWEDEN    fax: +46 (0)18 - 471 1094



On Oct 5, 2014, at 7:00 PM, javid dadashkarimi wrote:

Dear Ramesh,
I only want to extract dictionary within an aligned bilingual corpus. I know that Moses can do it for parallel and sentence-level aligned corpus, but are the tools like SketchEngine or Tshwanelex extracting such a knowledge?
Best,
Javid

On Sun, Oct 5, 2014 at 7:23 PM, Krishnamurthy, Ramesh <r.krishnamurthy at aston.ac.uk<mailto:r.krishnamurthy at aston.ac.uk><mailto:r.krishnamurthy at aston.ac.uk<mailto:r.krishnamurthy at aston.ac.uk>>> wrote:
hi javid
not sure quite what you want,
but i'd suggest contacting the
people at SketchEngine
http://www.sketchengine.co.uk/
and Tshwanelex
http://tshwanedje.com/tshwanelex/
best
ramesh
-------------
Date: Sat, 4 Oct 2014 15:11:02 +0330
From: javid dadashkarimi <javiddadashkarimi at gmail.com<mailto:javiddadashkarimi at gmail.com><mailto:javiddadashkarimi at gmail.com<mailto:javiddadashkarimi at gmail.com>>>
Subject: [Corpora-List] Bilingual Dictionary from Comparable Corpora
To: corpora at uib.no<mailto:corpora at uib.no><mailto:corpora at uib.no<mailto:corpora at uib.no>>, gate-users-request at lists.sourceforge.net<mailto:gate-users-request at lists.sourceforge.net><mailto:gate-users-request at lists.sourceforge.net<mailto:gate-users-request at lists.sourceforge.net>>

Hi,
Is there any tool for extracting probabilistic bilingual dictionary for a
bilingual comparable corpora? Does Moses support such a task?
Best,
Javid

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no<mailto:Corpora at uib.no><mailto:Corpora at uib.no<mailto:Corpora at uib.no>>
http://mailman.uib.no/listinfo/corpora


_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list