[Corpora-List] [Moses-support] filter parallel corpus

Anthony Rousseau anthony.rousseau at lium.univ-lemans.fr
Thu Jan 23 10:04:29 UTC 2014


Hello Saeed,

I think you can also use a tool called XenC I developed and released last year.
I believe it can help you since it was designed to cope with similar needs than yours.

You can read about it in this paper:
https://ufal.mff.cuni.cz/pbml/100/art-rousseau.pdf

Source code of the tool can be found here:
https://github.com/rousseau-lium/XenC

Best regards,

—
Anthony Rousseau, Ph.D.
LIUM, University of Le Mans
anthony.rousseau at lium.univ-lemans.fr


Le 16 janv. 2014 à 16:43, Saeed Farzi <saeedfarzi at gmail.com> a écrit :

> Dear all,
> 
> I am working on a translation task with a very large parallel corpus.
> Because of computational cost of training such a parallel corpus, i am
> going to filter it regarding to the test set ( of course , by the
> filtering, the evaluation must be still fair).
> 
> I am looking for  a solution  or a tool for filtering parallel corpus sentences.
> 
> Note that  i do not need to filter phrase table. I know that the
> filter_ moses tool reduces the phrase table size.
> 
> cheers
> -- 
>           S.Farzi, Ph.D. Student
>    Natural Language Processing Lab,
>  School of Electrical and Computer Eng.,
>               Tehran University
>             Tel: +9821-6111-9719
> _______________________________________________
> Moses-support mailing list
> Moses-support at mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140123/629fc759/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list