[Corpora-List] Standard ontology for document classification?
Ralf Steinberger
ralf.steinberger at jrc.it
Tue Oct 3 07:09:59 UTC 2006
Xabier,
The multilingual (over 20 languages), wide-coverage Eurovoc thesaurus with
its approximately 6000 classes has a subset of about 60 science-oriented
classes, plus many related terms and classes in other domains that may also
be useful (e.g. politics, law, economics, trade, finance, social questions,
education, employment, transport, envirosnment, agriculture, energy,
geography). The science-oriented classes provide the major science domains,
but may not be detailed enough for your purposes. Please check out for
yourself.
Eurovoc is browsable at http://europa.eu/eurovoc/ and is available free for
research purposes. For details on where to get Eurovoc, see
http://langtech.jrc.it/0509_EU-Enlargement-Workshop.html#HOW_TO_GET_THE_AC_C
ORPUS_AND_EUROVOC.
Eurovoc was developed for manual cataloguing of mainly parliamentary
documents, but collections of multi-label classified documents such as the
JRC-Acquis (http://langtech.jrc.it/JRC-Acquis.html) have been used to train
an automatic multi-label Eurovoc classification system.
I hope this helps. All the best,
Ralf
Ralf Steinberger
European Commission - Joint Research Centre (JRC)
IPSC - SeS - Language Technology ( <http://langtech.jrc.it/>
http://langtech.jrc.it, <http://press.jrc.it/NewsExplorer/>
http://press.jrc.it/NewsExplorer)
T.P. 267, Via Fermi 1
21020 Ispra (VA), Italy
-----Original Message-----
From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On
Behalf Of Xabier Saralegi Urizar
Sent: 02 October 2006 12:26
To: CORPORA at uib.no
Subject: [Corpora-List] Standard ontology for document classification?
Dear all,
I want to classify many scientific documents among different categories
based on their knowledge area, such as health, geography...
My question is whether there is a standard ontology for such a
classification.
Regards,
--
Xabier Saralegi Urizar
Elhuyar I+G+B
Zelai Haundi kalea, 3
Osinalde industrialdea
20170 Usurbil
(+34) 943 36 30 40
xabiers at elhuyar.com / www.elhuyar.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20061003/3452082f/attachment.htm>
More information about the Corpora
mailing list