[Corpora-List] Free text corpora?

Francis Tyers ftyers at prompsit.com
Tue Mar 2 22:03:17 UTC 2010


El dt 02 de 03 de 2010 a les 23:55 +0100, en/na Martin Wynne va
escriure:
> Francis Tyers wrote:
> > El dt 02 de 03 de 2010 a les 12:38 +0100, en/na Xin Yan va escriure:
> >   
> >> Hello,
> >>
> >> can anyone tell me, if there are some free text corpora 
> >> for commercial purpose?
> >> Thank you in advance!
> >>     
> >
> > You can download dumps of Wikipedia from http://download.wikimedia.org
> > -- they are licensed under the CC-BY-SA or GFDL -- both of which allow
> > commercial use, providing changes made are redistributed under the same
> > licence.
> >
> > Best regards,
> >
> > Fran
> >
> >
> > _______________________________________________
> > Corpora mailing list
> > Corpora at uib.no
> > http://mailman.uib.no/listinfo/corpora
> >   
> 
> Dumps of wikipedia may be an interesting electronic text collection that 
> can be used to help address various linguistic research questions, but I 
> think that the request was for a corpus...and a "dump" such as this 
> couldn't be further from qualifying as a corpus, if defined as "a 
> collection of pieces of language, selected and ordered according to 
> explicit linguistic criteria in order to be used as a sample of the 
> language.”

There are a good many people who are comfortable with the definition of
a corpus as a "crapload of text" ;)

And the request was for a corpus "free for commercial use", and the bad
news is that the majority of texts which are:

  "a collection of pieces of language, selected and ordered according
   to explicit linguistic criteria in order to be used as a sample of
   the language."

are not free for commercial use -- be that "free as in speech" or "free
as in beer" -- although I'd be delighted to hear otherwise.

Best,

Fran


_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list