[Corpora-List] A corpus to evaluate Keyword Extraction techniques

Alexander Schutz goalscoringsuperstarhero at gmail.com
Mon Jan 17 10:29:59 UTC 2011


Apologies,

it appears the PubMed URL has changed, I should have checked before sending.
Now, [1] includes a number of links to downloadable articles , in the section
XML for data mining via FTP .

[1] http://www.ncbi.nlm.nih.gov/pmc/about/ftp.html

Kind regards,
Alex


On Mon, Jan 17, 2011 at 10:26 AM, Alexander Schutz
<goalscoringsuperstarhero at gmail.com> wrote:
> Sandra,
>
> a dataset resulting from my master's thesis, 'Keyphrase Extraction
> from Single Documents in the Open Domain Exploiting Linguistic and
> Statistical Methods' [1] is available at [2].
>
> It was based on the PubMed dataset available for download [3], which
> already contains keyphrases for documents.
> My dataset basically contains a reference back to the original PubMed
> article via pmcid, the originally assigned keyphrases (gold standard),
> the keyphrases assigned by my approach including confidence, some
> indications as to which sort of match between gold standard and
> approach has occurred, and some document statistics. This is all on a
> per-document basis, covering 1323 documents from the original PubMed
> dataset (80k or so docs).
>
> In case you do not have time to read the full thesis, the procedure
> is summarised in [4] and subsequent pages.
> To gain a proper understanding of how this dataset was yielded, it is
> at least necessary to read and understand [5], or the evaluation
> chapter of the thesis.
>
> Happy extracting.
> Alex
>
> P.S. There is also a dataset for qualitative evaluation results,
> however as this comprised keyphrases from user-specified content, I
> suspect this is not useful for anyone else.
>
> P.P.S. If you have questions don't hesitate go gimme a shout
>
> [1] http://smile.deri.ie/sites/default/files/schutz-mappsc-2008-keyphrase-extraction_revised.pdf
> [2] http://smile.deri.ie/sites/default/files/quantitative-evaluation-dataset.zip
> [3] ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/articles.tar.gz
> [4] http://smile.deri.ie/projects/keyphrase-extraction
> [5] http://smile.deri.ie/node/204
>
> On Mon, Jan 17, 2011 at 8:29 AM, Sandra Garcia Blasco
> <sgarcia at dsic.upv.es> wrote:
>> Dear all,
>>
>> We are interested in evaluate our method for Keyword Extraction, but we are
>> having a hard time finding a corpus to evaluate it. Does any of you know of
>> an available corpus of texts with related keywords?
>>
>> Thank you very much for your help,
>>
>>
>> Sandra Garcia --
>>
>> Universitat Politécnica de Valencia
>>
>> _______________________________________________
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>>
>

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list