[Corpora-List] Translation evaluation using word alignment
Emmanuel Prochasson
emmanuel.prochasson at univ-nantes.fr
Tue Mar 9 09:35:45 UTC 2010
On 03/09/2010 05:06 PM, Alberto Simões wrote:
> Dear Emmanuel
>
> Probably not good enough for your needs, but my experiment with NATools
> was, after obtaining a decent probabilistic translation dictionary
> (using any kind of parallel corpora you can find) use that probabilities
> to measure the likeliness of two sentences being parallel.
>
> How did I measure it... searching for each word on the S(ource)
> L(anguage) and checking if a translation is present in the T(arget)
> L(anguage), and geting the average of the probabilities. Then, same
> approach from TL to SL.
>
> Not fancy, but gave some interesting results.
>
I actually use a similar approach to find some good candidates (but I
need to filter them). Instead of using a probabilistic dictionary
computed from a parallel corpus, I use a regular lexicon.
The results are interesting, but typically, it won't be able to see a
difference between
"Jon appeared on TV" and
"TV appeared on Jon" (and any translation, say, for example in French:
"Jon est passé à la TV").
Both sentence will perfectly match the French translation. I need to go
a bit deeper than lexicon level.
In the first case, I wish to obtain something like :
Jon/Jon est passé/appeared à la/on TV/TV => 100% match
in the second case:
Jon/NULL est passé/appeared à la/on TV/NULL => 50% match
(I'm aware than in such a case, any alignment algorithm is likely to be
confused, but this is just an illustration).
Regards,
--
Emmanuel
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list