[Corpora-List] lexicographic tools for parallel/comparable corpora

Tue Feb 6 18:03:27 UTC 2007

Joerg Tiedemann <tiedeman at let.rug.nl> kirjoitti: 

> I'm looking for information about tools for the lexicographic use of 
> parallel and comparable corpora. 

The Finnish translation technology company Masterin has a bilingual term extractor that builds a raw bilingual translation lexicon from translation memory databases (which are, naturally, comparable to parallel corpora). (Shameless plug: The term extraction module will be available in the forth-coming Masterin 2007 translation tool.)

Masterin's solution is language-aware and supports English, Swedish and Finnish (any pair and direction). This enables the use of both rule-based and more traditional statistical approaches which in turn leads to impressive results. The tool is being used for the extraction of domain-specific translation lexica as we speak and very efficiently I might add.

I'll be glad to do a test run for you, should you have any parallel data in the languages covered by Masterin. (Maybe some English-Swedish stuff?) Please feel free to contact me directly.

Best regards,

Mickel Grönroos

--
Mickel Grönroos
Chief Language Officer, Masterin
Tekniikantie 14, FIN-02150 Espoo, Finland, www.masterin.com