[Corpora-List] Translation evaluation using word alignment

Tue Mar 9 09:35:45 UTC 2010

On 03/09/2010 05:06 PM, Alberto Simões wrote:
> Dear Emmanuel
>
> Probably not good enough for your needs, but my experiment with NATools
> was, after obtaining a decent probabilistic translation dictionary
> (using any kind of parallel corpora you can find) use that probabilities
> to measure the likeliness of two sentences being parallel.
>
> How did I measure it... searching for each word on the S(ource)
> L(anguage) and checking if a translation is present in the T(arget)
> L(anguage), and geting the average of the probabilities. Then, same
> approach from TL to SL.
>
> Not fancy, but gave some interesting results.
>    

I actually use a similar approach to find some good candidates (but I 
need to filter them). Instead of using a probabilistic dictionary 
computed from a parallel corpus, I use a regular lexicon.

The results are interesting, but typically, it won't be able to see a 
difference between
"Jon appeared on TV" and
"TV appeared on Jon" (and any translation, say, for example in French: 
"Jon est passé à la TV").

Both sentence will perfectly match the French translation. I need to go 
a bit deeper than lexicon level.

In the first case, I wish to obtain something like :
Jon/Jon est passé/appeared à la/on TV/TV => 100% match
in the second case:
Jon/NULL est passé/appeared à la/on TV/NULL => 50% match

(I'm aware than in such a case, any alignment algorithm is likely to be 
confused, but this is just an illustration).

Regards,

-- 
Emmanuel

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora