[Corpora-List] New multi-parallel corpus available (Indic Languages)

Miles Osborne miles at inf.ed.ac.uk
Tue Jan 24 15:29:49 UTC 2012


The Indic multi-parallel corpus consists of approximately 2000
Wikipedia sentences translated into the following Indic languages:

Bengali
Hindi
Malayalam
Tamil
Telugi
Urdu

The data was translated by non-expert translators hired over
Mechanical Turk and so it is of mixed quality. Every source source
segments was translated redundantly by four different Turkers.
Note that we have translated paragraphs, so the data should be of
interest to researchers looking at discourse as well as machine
translation.

http://homepages.inf.ed.ac.uk/miles/babel.html

Miles Osborne (Edinburgh)
Chris Callison-Burch (JHU)


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list