[Corpora-List] Release of the FetchProt Corpus

Kristofer Franzén franzen at sics.se
Fri Sep 23 14:32:07 UTC 2005


Dear colleagues,

I am pleased to announce the first release of the FetchProt corpus.
It is based on 177 full text journal articles from the biological domain 
analyzed for experiments on proteins to validate tyrosine kinase activity.
The 177 filled template files contain 591 experiments on wild types and 
82 different mutants of 77 proteins.
Apart from the template files the corpus includes text versions of the 
articles with the analyzed content tagged, as reference to where in the 
article the information in the template is to be found.
The proteins and experiments are, among other things, linked to UniProt 
identity codes, and Gene Ontology molecular function codes.

The corpus has been compiled within the FetchProt project, a 
collaboration between Swedish Institute of Computer Science (SICS), 
Center for Genomics and Bioinformatics at Karolinska Institutet (CGB/KI) 
and Metamatrix AB, and has received partial funding from VINNOVA, the 
Swedish Agency for Innovation Systems.
The aim of the project is to build a system that aids in populating the 
EXProt database of proteins with experimentally verified functions, by 
means of information extraction from full text scientific journal papers.

More information on the corpus and its analysis can be found in the 
documentation at 
http://fetchprot.sics.se/Corpus/Release20050923/FetchProtCorpusDocumentation1.0.pdf

The corpus is free to download from the project homepage at 
http://fetchprot.sics.se/


Best regards,

Kristofer Franzén
Swedish Institute of Computer Science



More information about the Corpora mailing list