[Corpora-List] SourceForge as a corpus
radev at umich.edu
radev at umich.edu
Thu Jan 24 15:56:55 UTC 2008
We could start by creating a page on the ACL wiki:
http://aclweb.org/aclwiki/
with a list of candidate corpora and contact people for each of
them. Here are some examples: Google n-grams, Enron email, GENIA, etc.
Drago
>
> On Jan 24, 2008 8:45 AM, <radev at umich.edu> wrote:
> > We need a public corpus repository. Perhaps something worth starting
> > a discussion about.
>
> I agree that a discussion is a good idea. To kick it off: one of the
> things that I know about any corpus on my SourceForge site is that all
> copyright issues are in order. One of the things that you know when
> LDC hosts your corpus for you is that they will make sure that all
> copyright issues are in order. What would be a mechanism for ensuring
> this in a public corpus repository? One option would be to control
> deposition of data in the same way that any SourceForge project vetts
> its participants; the people with the responsibility for doing this
> would then be tasked with exercising due diligence in verifying that
> the corpus builders themselves had cleared all copyright issues. On
> this model, responsibility for dealing with copyright issues stays
> with the corpus builders, not the SourceForge project coordinators.
> However, that doesn't make the project coordinators' work be zero, and
> it's not clear how that work could be funded in the long term.
> Thoughts?
>
> Kev
>
> --
> K. B. Cohen
> Biomedical Text Mining Group Lead
> Center for Computational Pharmacology
> 303-916-2417 (cell) 303-377-9194 (home)
> http://compbio.uchsc.edu/Hunter_lab/Cohen
>
>
--
Dragomir R. Radev Associate Professor
SI, CSE, Ling U. Michigan, Ann Arbor
http://www.eecs.umich.edu/~radev radev at umich.edu
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list