[Corpora-List] Seeking for a free comparable corpus
Darren Cook
darren at dcook.org
Sat Jun 14 14:15:42 UTC 2014
> I'm working on Cross Language Information Retrieval based on
> comparable corpora. In order to test my approach, I need a free
> comparable corpus between English language and an European language.
I was just trying to understand the difference between "parallel corpus"
and "comparable corpus". Am I correct in thinking that if an article is
translated (by a professional human translator, or a machine) from one
language to another, such that there is a sentence-level correspondence,
then it is a parallel corpus. Whereas a comparable corpus is one where
the two articles were written about the same subject, but neither is a
translation of the other, and mostly the same knowledge is covered, but
a sentence-level mapping would not exist?
If so, Wikipedia sounds like an ideal source.
E.g.
http://en.wikipedia.org/wiki/Paris
http://fr.wikipedia.org/wiki/Paris
http://en.wikipedia.org/wiki/Association_football
http://fr.wikipedia.org/wiki/Football
etc.
Darren
--
Darren Cook, Software Researcher/Developer
My new book: Data Push Apps with HTML5 SSE
Published by O'Reilly: (ask me for a discount code!)
http://shop.oreilly.com/product/0636920030928.do
Also on Amazon and at all good booksellers!
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list