[Corpora-List] corpus of abstracts/papers with free-form keywords

Jimmy O'Regan joregan at gmail.com
Thu Dec 2 14:53:52 UTC 2010


On 2 December 2010 14:12, Mark Johnson <mark.johnson at mq.edu.au> wrote:
> I'm trying to evaluate unsupervised algorithms for identifying topical
> collocations in document collections.  One idea I've had is: if I had a
> corpus of abstracts or papers that have been manually labelled with
> free-form keywords, I could evaluate the degree to which the topical
> collocations match the human-annotated keywords.   Can anyone point me to a
> suitable corpus -- perhaps one that has already been used for this purpose?

http://www-nlpir.nist.gov/related_projects/tipster_summac/cmp_lg.html
has something along those lines.

HTML Tags as Extraction Cues for Web Page Description Construction
(2003), Timothy C. Craven, Informing Science Journal
uses the META tags in HTML for this; you could also harvest sites such
as delicious.com and bibsonomy.org that allow users to attach their
own keywords.


-- 
<Leftmost> jimregan, that's because deep inside you, you are evil.
<Leftmost> Also not-so-deep inside you.

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list