[Corpora-List] Hierarchically classified corpora?
Ralf Steinberger
ralf.steinberger at jrc.it
Tue Jan 16 16:20:59 UTC 2007
Dear Daniel,
The JRC-Acquis parallel corpus is available in 21 languages, including
English and German. Most JRC-Acquis texts are indexed with the
hierarchically organised Eurovoc thesaurus (you need to get a licence in
order to receive Eurovoc and info on the hierarchical structure, but that's
free for research purposes). Unfortunately, it is not about linguistics or
computer science.
You find more information about the JRC-Acquis, including the link where to
download it at http://langtech.jrc.it/ <http://langtech.jrc.it/index.html> .
Marko Grobelnik from Jozef Stefan Institute in Ljubljana has worked on
hierarchical classification, as well, using DMOZ. Would this thesaurus and
document collection be more appropriate for you?
I hope this helps.
Greetings from the other side of the Alps.
Ralf
PS: I'd be interested in hearing about the outcome of your work, when it
becomes available. :-)
Ralf Steinberger ( <mailto:Ralf.Steinberger at jrc.it> Ralf.Steinberger at jrc.it)
European Commission - Joint Research Centre (JRC)
IPSC - SeS - Language Technology ( <http://langtech.jrc.it/>
http://langtech.jrc.it, <http://press.jrc.it/NewsExplorer/>
http://press.jrc.it/NewsExplorer)
T.P. 267, Via Fermi 1
21020 Ispra (VA), Italy
-----Original Message-----
From: owner-corpora at lists.uib.no [mailto:owner-corpora at lists.uib.no] On
Behalf Of Daniel Beck
Sent: 16 January 2007 17:02
To: corpora at hd.uib.no
Subject: [Corpora-List] Hierarchically classified corpora?
Hello corpora mailing list,
I'm working on my master thesis "Accurate Hierarchical Classification
using NLP Techniques". I hope to improve the accuracy of hierarchical
classification on English and German corpora by using additional
information extracted with aid of linguistic tools.
I would like to ask where I can obtain corpora which are already
classified in a hierarchy. I need several English and German corpora. I
would prefer if the topics of the corpora are about linguistic or
computer science.
Regards & Thanks,
Daniel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20070116/7f47cc3b/attachment.htm>
More information about the Corpora
mailing list