[Corpora-List] Seeking for a free comparable corpus

Darren Cook darren at dcook.org
Sat Jun 14 14:15:42 UTC 2014


> I'm working on Cross Language Information Retrieval based on
> comparable corpora. In order to test my approach, I need a free
> comparable corpus between English language and an European language.

I was just trying to understand the difference between "parallel corpus"
and "comparable corpus". Am I correct in thinking that if an article is
translated (by a professional human translator, or a machine) from one
language to another, such that there is a sentence-level correspondence,
then it is a parallel corpus. Whereas a comparable corpus is one where
the two articles were written about the same subject, but neither is a
translation of the other, and mostly the same knowledge is covered, but
a sentence-level mapping would not exist?

If so, Wikipedia sounds like an ideal source.
E.g.
  http://en.wikipedia.org/wiki/Paris
  http://fr.wikipedia.org/wiki/Paris

  http://en.wikipedia.org/wiki/Association_football
  http://fr.wikipedia.org/wiki/Football

etc.

Darren


-- 
Darren Cook, Software Researcher/Developer
My new book: Data Push Apps with HTML5 SSE
Published by O'Reilly: (ask me for a discount code!)
  http://shop.oreilly.com/product/0636920030928.do
Also on Amazon and at all good booksellers!

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list