[Corpora-List] Request

RAMIREZ_CABRERA_GASPAR gasprami at siu.buap.mx
Tue May 31 17:09:45 UTC 2005


I am ahead to finish my Master Degree and I am interested in resolving the
following problem

Given four sentences (all translations of the same question into Spanish)
produced by four different machine translators available on the web: How can
I decide which one of the four is the best translation? We have collected a
corpus of about 500 sets of translated sentences into Spanish. Also, we have
noticed that the best sentence is not always made by the same machine
translator. Thus sometimes the best translation is produced by the first
machine translator and sometimes by the fourth, etc. Therefore we cannot say
which machine translator is the best and discard the others. My idea is to
align and then extract information such as the verb and its arguments
(valencies) from each one of the sentences in order to build a well-formed
sentence by using a dictionary. The dictionary will contain information
about the lexicon and the Logical Form of each verb in the language. It will
also contain information about nouns and adjectives, as they are considered
heads of phrases in sentences.  In this way, we may even be able to
construct a better sentence than any of the translations.

I would very much appreciate any advice about the alignment process and its
pitfalls, which programs to use, etc.  Likewise for the syntactic analysis
process.

Thanks very much.



More information about the Corpora mailing list