[Corpora-List] Hierarchically classified corpora?

Ralf Steinberger ralf.steinberger at jrc.it
Mon Jan 22 07:20:06 UTC 2007


Hello Daniel,

 

You may also want to consider the hierarchically classified HEP corpus. It
is in English (i.e. no German texts) and not about computer science, but it
is very well documented, has a good size, etc. You find it at:

 

   http://sinai.ujaen.es/wiki/index.php/HepCorpus#English_version

 

Arturo Montejo Ráez (amontejo AT ujaen.es) will be happy to help you with
any questions you may have. A useful feature about this corpus is that
Arturo has already produced a number of benchmark values for categorisation
with various methods. 

 

Ralf

 

 

Ralf Steinberger ( <mailto:Ralf.Steinberger at jrc.it> Ralf.Steinberger at jrc.it)

European Commission - Joint Research Centre (JRC)
IPSC - SeS - Language Technology ( <http://langtech.jrc.it/>
http://langtech.jrc.it,  <http://press.jrc.it/NewsExplorer/>
http://press.jrc.it/NewsExplorer) 
T.P. 267, Via Fermi 1
21020 Ispra (VA), Italy



 

-----Original Message-----



> I'm working on my master thesis "Accurate Hierarchical Classification 

> using NLP Techniques". I hope to improve the accuracy of hierarchical 

> classification on English and German corpora by using additional 

> information extracted with aid of linguistic tools.

> 

> I would like to ask where I can obtain corpora which are already 

> classified in a hierarchy. I need several English and German corpora. I 

> would prefer if the topics of the corpora are about linguistic or 

> computer science.

> 

> Regards & Thanks,

> 

> Daniel

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20070122/5a89b91a/attachment.htm>


More information about the Corpora mailing list