[Corpora-List] Hierarchically classified corpora?
Armin Schmidt
armin.sch at gmail.com
Tue Jan 16 16:48:07 UTC 2007
Hi Daniel,
Wikipedia (http://www.wikipedia.org) applies hierarchical categorization
to their articles. It provides very large corpora for German and
English. You can download the corpora in XML-format here:
http://download.wikimedia.org/backup-index.html. It's all free and you
can quite easily generate domain-specific corpora that are of interest
for you, e.g. those about computer science or linguistics, by simply
extracting articles having a particular tag. Also, look here:
https://www.cs.tcd.ie/esslli2007/content/courses/id19.html
Best,
Armin
Daniel Beck schrieb:
> Hello corpora mailing list,
>
> I'm working on my master thesis "Accurate Hierarchical Classification
> using NLP Techniques". I hope to improve the accuracy of hierarchical
> classification on English and German corpora by using additional
> information extracted with aid of linguistic tools.
>
> I would like to ask where I can obtain corpora which are already
> classified in a hierarchy. I need several English and German corpora. I
> would prefer if the topics of the corpora are about linguistic or
> computer science.
>
> Regards & Thanks,
>
> Daniel
More information about the Corpora
mailing list