[Corpora-List] Using corpora in SMT
Paul Johnston
paul.a.johnston at manchester.ac.uk
Mon Sep 21 11:27:32 UTC 2009
Apologies for being a bit off topic but several years ago I built a toy
Statistical Machine Translation system using a hand crafted
Estonian-English corpus to generate the translation model, the BNC as
the language model and Giza, The CMU Toolkit, Perl and the ISI decoder
to actually implement the system.
I added a small level of morphological processing which greatly
increased the performance by extracting case information from the
Estonian texts.
It was good fun and very interesting but as it was some time ago I
wonder what is available if I were to repeat the exercise half a decade
later.
The computing power I have has increased a lot, especially in the area
of storage and I could get a lot bigger parallel corpus now.
What is there new to play with?
Regards Paul
Paul Johnston
Humanities ICT (Infrastructure)
Samuel Alexander Building
Room W1.19
e-mail Paul.Johnston at manchester.ac.uk
web http://web-1.humanities.manchester.ac.uk/prjs/mcasspj/
Tuzoqlar granatalardan yuksak darajali portlovchi moddalardan yoki
bosshqa narslardan qilingan?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20090921/2a9e6700/attachment.htm>
-------------- next part --------------
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list