[Corpora-List] SMT models trained on EUROPARL

Joerg Tiedemann tiedeman at let.rug.nl
Wed Dec 15 14:07:20 UTC 2004


for people interested in MT and alignment:

models for statistical machine translation trained with GIZA++ and the 
EUROPARL corpus are now available from the OPUS homepage:

http://logos.uio.no/cgi-bin/opus/viewcvs.cgi/opus/EUROPARL/wordalign/

I used the standard settings of GIZA++ for producing IBM model 4. so 
far you can find the models of all languages aligned to Dutch (in both 
directions). models for other language pairs will be made available as 
soon as the training is finished. 

there are also files with the complete list of token links and type links 
produced from the intersection of source-to-target and target-to-source 
Viterbi alignments. token links are in XML in the files called 
SRCTRG.inter.gz and type links are in files called SRCTRG.dic.gz (with SRC 
and TRG replaced by the actual language code). everything is encoded in 
unicode utf8.

please let me know if this is useful for you. would be nice to know if 
this is not only a waste of hardisk space.

best regards,


Jörg

***********/\/\/\/\/\/\/\/\/\/\/\************************************
**  Jörg Tiedemann                 tiedeman at let.rug.nl             **
**  Alfa-Informatica               http://www.let.rug.nl/~tiedeman **  
**  Rijksuniversiteit Groningen     Harmoniegebouw, room 1311-429  **
**  Oude Kijk in 't Jatstraat 26    phone: +31 (0)50-363 5935      **
**  9712 EK Groningen               fax:   +31 (0)50-363 6855      **
*************************************/\/\/\/\/\/\/\/\/\/\/\**********



More information about the Corpora mailing list