[Corpora-List] Hierarchically classified corpora?

Armin Schmidt armin.sch at gmail.com
Tue Jan 16 16:48:07 UTC 2007


Hi Daniel,

Wikipedia (http://www.wikipedia.org) applies hierarchical categorization
to their articles. It provides very large corpora for German and
English. You can download the corpora in XML-format here:
http://download.wikimedia.org/backup-index.html. It's all free and you
can quite easily generate domain-specific corpora that are of interest
for you, e.g. those about computer science or linguistics, by simply
extracting articles having a particular tag. Also, look here:
https://www.cs.tcd.ie/esslli2007/content/courses/id19.html

Best,
Armin

Daniel Beck schrieb:
> Hello corpora mailing list,
> 
> I'm working on my master thesis "Accurate Hierarchical Classification
> using NLP Techniques". I hope to improve the accuracy of hierarchical
> classification on English and German corpora by using additional
> information extracted with aid of linguistic tools.
> 
> I would like to ask where I can obtain corpora which are already
> classified in a hierarchy. I need several English and German corpora. I
> would prefer if the topics of the corpora are about linguistic or
> computer science.
> 
> Regards & Thanks,
> 
> Daniel



More information about the Corpora mailing list