[Corpora-List] [Moses-support] filter parallel corpus
Anthony Rousseau
anthony.rousseau at lium.univ-lemans.fr
Thu Jan 23 10:04:29 UTC 2014
Hello Saeed,
I think you can also use a tool called XenC I developed and released last year.
I believe it can help you since it was designed to cope with similar needs than yours.
You can read about it in this paper:
https://ufal.mff.cuni.cz/pbml/100/art-rousseau.pdf
Source code of the tool can be found here:
https://github.com/rousseau-lium/XenC
Best regards,
—
Anthony Rousseau, Ph.D.
LIUM, University of Le Mans
anthony.rousseau at lium.univ-lemans.fr
Le 16 janv. 2014 à 16:43, Saeed Farzi <saeedfarzi at gmail.com> a écrit :
> Dear all,
>
> I am working on a translation task with a very large parallel corpus.
> Because of computational cost of training such a parallel corpus, i am
> going to filter it regarding to the test set ( of course , by the
> filtering, the evaluation must be still fair).
>
> I am looking for a solution or a tool for filtering parallel corpus sentences.
>
> Note that i do not need to filter phrase table. I know that the
> filter_ moses tool reduces the phrase table size.
>
> cheers
> --
> S.Farzi, Ph.D. Student
> Natural Language Processing Lab,
> School of Electrical and Computer Eng.,
> Tehran University
> Tel: +9821-6111-9719
> _______________________________________________
> Moses-support mailing list
> Moses-support at mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20140123/629fc759/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list