[Corpora-List] Hierarchically classified corpora?

Pierre Zweigenbaum pz at limsi.fr
Thu Jan 18 14:39:26 UTC 2007


Dear Daniel,

> I'm working on my master thesis "Accurate Hierarchical Classification 
> using NLP Techniques". I hope to improve the accuracy of hierarchical 
> classification on English and German corpora by using additional 
> information extracted with aid of linguistic tools.
> 
> I would like to ask where I can obtain corpora which are already 
> classified in a hierarchy. I need several English and German corpora. I 
> would prefer if the topics of the corpora are about linguistic or 
> computer science.
> 
> Regards & Thanks,
> 
> Daniel

The Medline database of scientific publications in the
biomedical domain contains article abstracts which are
indexed using the hierarchically organized MeSH thesaurus.
It can be obtained for free through a license with the US National
Library of Medicine. It currently contains over 16 million records,
a majority of which have English abstracts.

http://www.nlm.nih.gov/bsd/licensee/2007_stats/baseline_doc.html

Greetings from the other side of the Rhine.

		Pierre.

-- 
Pierre Zweigenbaum
----
LIMSI - CNRS
Groupe LIR / Dépt. Communication Homme-Machine
Tél : (+33) (0)1 69 85 80 04 ; Fax : (+33) (0)1 69 85 80 88
Mél : pz at limsi.fr ; Toile : http://www.limsi.fr/~pz/
Lieu : Bâtiment 508, Université Paris XI, 
Courrier : LIMSI, BP 133, 91403 ORSAY Cedex, France
----
CRIM, Institut National des Langues et Civilisations Orientales
----



More information about the Corpora mailing list