[Corpora-List] SourceForge as a corpus

Kevin B. Cohen kevin.cohen at gmail.com
Thu Jan 24 15:52:44 UTC 2008


On Jan 24, 2008 8:45 AM,  <radev at umich.edu> wrote:
> We need a public corpus repository.  Perhaps something worth starting
> a discussion about.

I agree that a discussion is a good idea.  To kick it off: one of the
things that I know about any corpus on my SourceForge site is that all
copyright issues are in order.  One of the things that you know when
LDC hosts your corpus for you is that they will make sure that all
copyright issues are in order.  What would be a mechanism for ensuring
this in a public corpus repository?  One option would be to control
deposition of data in the same way that any SourceForge project vetts
its participants; the people with the responsibility for doing this
would then be tasked with exercising due diligence in verifying that
the corpus builders themselves had cleared all copyright issues.  On
this model, responsibility for dealing with copyright issues stays
with the corpus builders, not the SourceForge project coordinators.
However, that doesn't make the project coordinators' work be zero, and
it's not clear how that work could be funded in the long term.
Thoughts?

Kev

-- 
K. B. Cohen
Biomedical Text Mining Group Lead
Center for Computational Pharmacology
303-916-2417 (cell) 303-377-9194 (home)
http://compbio.uchsc.edu/Hunter_lab/Cohen

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list