[Corpora-List] SourceForge as a corpus

radev at umich.edu radev at umich.edu
Thu Jan 24 15:56:55 UTC 2008


We could start by creating a page on the ACL wiki:

http://aclweb.org/aclwiki/

with a list of candidate corpora and contact people for each of
them. Here are some examples: Google n-grams, Enron email, GENIA, etc.

Drago

> 
> On Jan 24, 2008 8:45 AM,  <radev at umich.edu> wrote:
> > We need a public corpus repository.  Perhaps something worth starting
> > a discussion about.
> 
> I agree that a discussion is a good idea.  To kick it off: one of the
> things that I know about any corpus on my SourceForge site is that all
> copyright issues are in order.  One of the things that you know when
> LDC hosts your corpus for you is that they will make sure that all
> copyright issues are in order.  What would be a mechanism for ensuring
> this in a public corpus repository?  One option would be to control
> deposition of data in the same way that any SourceForge project vetts
> its participants; the people with the responsibility for doing this
> would then be tasked with exercising due diligence in verifying that
> the corpus builders themselves had cleared all copyright issues.  On
> this model, responsibility for dealing with copyright issues stays
> with the corpus builders, not the SourceForge project coordinators.
> However, that doesn't make the project coordinators' work be zero, and
> it's not clear how that work could be funded in the long term.
> Thoughts?
> 
> Kev
> 
> -- 
> K. B. Cohen
> Biomedical Text Mining Group Lead
> Center for Computational Pharmacology
> 303-916-2417 (cell) 303-377-9194 (home)
> http://compbio.uchsc.edu/Hunter_lab/Cohen
> 
> 


-- 
Dragomir R. Radev                    Associate Professor
SI, CSE, Ling                     U. Michigan, Ann Arbor 
http://www.eecs.umich.edu/~radev         radev at umich.edu              

_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora



More information about the Corpora mailing list