<div dir="ltr">Dear Saeed,<div><br></div><div style>You can do the data selection using IRSTLM. I think it fits your need. Take a look at the following link: </div><div style><a href="http://sourceforge.net/apps/mediawiki/irstlm/index.php?title=Data_selection">http://sourceforge.net/apps/mediawiki/irstlm/index.php?title=Data_selection</a><br>
</div><div style><br></div><div style>It helps you to find the subset of sentences within your large training corpus that fits better with your test corpus.</div><div style>Note that it is originally designed for the monolingual scenario. But, If you want to filter the parallel corpus, you can do the following:</div>
<div style><br></div><div style>1. add line numbers to the beginning of the lines of the source side of your training corpus. </div><div style>2. Do the data selection as is described in the manual</div><div style>3. Extract the corresponding translations of the selected source lines.</div>
<div style>4. Enjoy life</div><div style><br></div><div style>Bests,</div><div style>Amin</div><div style><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Jan 16, 2014 at 4:43 PM, Saeed Farzi <span dir="ltr"><<a href="mailto:saeedfarzi@gmail.com" target="_blank">saeedfarzi@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Dear all,<br>
<br>
I am working on a translation task with a very large parallel corpus.<br>
Because of computational cost of training such a parallel corpus, i am<br>
going to filter it regarding to the test set ( of course , by the<br>
filtering, the evaluation must be still fair).<br>
<br>
I am looking for a solution or a tool for filtering parallel corpus sentences.<br>
<br>
Note that i do not need to filter phrase table. I know that the<br>
filter_ moses tool reduces the phrase table size.<br>
<br>
cheers<br>
<span class="HOEnZb"><font color="#888888">--<br>
S.Farzi, Ph.D. Student<br>
Natural Language Processing Lab,<br>
School of Electrical and Computer Eng.,<br>
Tehran University<br>
Tel: <a href="tel:%2B9821-6111-9719" value="+982161119719">+9821-6111-9719</a><br>
_______________________________________________<br>
Moses-support mailing list<br>
<a href="mailto:Moses-support@mit.edu">Moses-support@mit.edu</a><br>
<a href="http://mailman.mit.edu/mailman/listinfo/moses-support" target="_blank">http://mailman.mit.edu/mailman/listinfo/moses-support</a><br>
</font></span></blockquote></div><br></div>