[Corpora-List] Translation evaluation using word alignment

Kuzman Ganchev kuzman at cis.upenn.edu
Tue Mar 9 13:20:23 UTC 2010


It's fairly likely that you'll get similar alignment scores for Jon
appeared on TV and TV appeared on Jon.  The distortion model for e.g.
HMM alignment is relatively weak compared to the translation model, and
re-ordering is pretty frequent even in En/Fr.  Depending on your task,
that might not matter. 

Googling for "machine translation forced decoding" brought up
http://cdec-decoder.org/index.php?title=Main_Page which I haven't heard
of before, but it looks like it's fast and can do what you want: take a
sentence pair as input and compute a score.  

Kuzman 

On Tue, Mar 09, 2010 at 05:35:45PM +0800, Emmanuel Prochasson wrote:
> I actually use a similar approach to find some good candidates (but I need 
> to filter them). Instead of using a probabilistic dictionary computed from 
> a parallel corpus, I use a regular lexicon.
>
> The results are interesting, but typically, it won't be able to see a 
> difference between
> "Jon appeared on TV" and
> "TV appeared on Jon" (and any translation, say, for example in French: "Jon 
> est passé à la TV").
>
> Both sentence will perfectly match the French translation. I need to go a 
> bit deeper than lexicon level.
>
> In the first case, I wish to obtain something like :
> Jon/Jon est passé/appeared à la/on TV/TV => 100% match
> in the second case:
> Jon/NULL est passé/appeared à la/on TV/NULL => 50% match
>
> (I'm aware than in such a case, any alignment algorithm is likely to be 
> confused, but this is just an illustration).
>
> Regards,
>
> -- 
> Emmanuel
>
>
> _______________________________________________
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list