[Corpora-List] Interlingual Machine Translation Systems (fwd)

Gilles Serasset Gilles.Serasset at imag.fr
Sun Nov 21 17:13:49 UTC 2004


Sorry Serguei, but your mail is based on naive ideas and false 
assumptions.

On 21 nov. 04, at 10:53, Sergey Protasov wrote:
> We should define the term "good" of MT systems.
>
> If we take arbitraty sentences from some very big not specialized 
> english corpus and translate it, using expert-man-translator, we have 
> about 80-90% correctly translated sentences.
> Let's define this as the best quality of translation.

Which should mean that current measures (BLEU, ORANGE,...) should rank 
these as top "systems". Which, apparently, is not the case.

> So "good" translation is about 45-50% of correct sentences.

THIS is naive, 100% of incorrect, but "understandable" sentences is 
better than 50% of totally unintelligible sentences (especially if it 
is the 50% sentences that are more than 7 or 8 words long...). 
Moreover, this does not take into account the purpose of the system. 
For example, SYSTRAN will be considered as a very bad system for the 
translation of meteorological bulletin, where METEO will be considered 
VERY GOOD (with your definition...). However, METEO will never be 
considered as a good system for wide coverage application, where 
SYSTRAN will be considered good.

Also, we should distinguish usage, coverage, quality and potential (the 
amount of effort that is needed to raise one of the criteria).

> I think, Systran and any other MT system can translate correctly not 
> more than one percent of sentences, arbitrary selected from big 
> corpus.

Well, even if it was the case (which I doubt if such evaluation is done 
on a fair basis), SYSTRAN will still be useful. The proof being that, 
well, it IS used by many.

> This is not "good" in any case, IMHO.
>
Well, 2 months ago, I was going to Japan and wanted to know the 
directions to Okayama University. The "how to get there" was only 
available in Japanese... Hence, I asked Systran to translate it into 
english. I'm sure that the english was bad, but well, I don't read 
Japanese, and English is not my mother tongue, but still, I managed to 
get where I wanted to go.

This is not "bad" in any case, IMHO.

If you want to have a look at Russian

Finally, if you are speaking about statistical MT, forget what I said, 
as I don't know ANY statistical MT system that is used daily.


--
Gilles Sérasset
GETA-CLIPS-IMAG (UJF, INPG & CNRS)
BP 53 - F-38041 Grenoble Cedex 9
Phone: +33 4 76 51 43 80
Fax:   +33 4 76 44 66 75



More information about the Corpora mailing list