[Corpora-List] Hierarchically classified corpora?

Tony Abou-Assaleh taa at acm.org
Tue Jan 16 16:14:59 UTC 2007


Hi Daniel,

Some datasets that come to mind are ACM digital library for CS-related
publications (but need to be careful about licensing issues), and dmoz.org
for Web pages. The open directory dmoz.org is available for several
languages.

Cheers,

TAA

-----------------------------------------------------
Tony Abou-Assaleh
Email:    taa at acm.org
Web site: http://tony.abou-assaleh.net
----------------------[THE END]----------------------

On Tue, 16 Jan 2007, Daniel Beck wrote:

> Hello corpora mailing list,
>
> I'm working on my master thesis "Accurate Hierarchical Classification
> using NLP Techniques". I hope to improve the accuracy of hierarchical
> classification on English and German corpora by using additional
> information extracted with aid of linguistic tools.
>
> I would like to ask where I can obtain corpora which are already
> classified in a hierarchy. I need several English and German corpora. I
> would prefer if the topics of the corpora are about linguistic or
> computer science.
>
> Regards & Thanks,
>
> Daniel
>
>
>



More information about the Corpora mailing list