[Corpora-List] Our large USENET corpus is now available on Amazon Web Services (AWS).
Cyrus Shaoul
cyrus.shaoul at ualberta.ca
Wed Nov 17 22:39:57 UTC 2010
Dear Corpora-list members,
After receiving many requests over the years for a better way to obtain
our 28 billion word USENET corpus, I have recently submitted the corpus
to Amazon Web Services, and they have graciously made it one of their
public data sets. These data is being hosted as a public service by AWS.
To use this
data, just set up an account at AWS and then mount the snapshot listed
below:
http://aws.amazon.com/datasets/1679761938200766
Please let me know if this resolves the issues you have had with
downloading the corpus. In theory, mounting and copying this dataset
should now take minutes instead of days.
NOTE: Make sure to read the license before using our corpus as we place
restrictions on its usage.
It is free to use for all academic and non-profit projects, but please
cite the corpus when you
report your results!
The corpus continues to be available over BitTorrent and HTTP here:
http://www.psych.ualberta.ca/~westburylab/downloads/usenetcorpus.download.html
Finally, the USENET data for Jan-Dec 2010 should be available in January
of 2011 if all goes well.
Yours,
Cyrus
=[=]={=}=[=]={=}=[=]={=}=[=]={=}=[=]={=}
Cyrus Shaoul
http://www.psych.ualberta.ca/~westburylab/
University of Alberta
780-492-5843
=[=]={=}=[=]={=}=[=]={=}=[=]={=}=[=]={=}
_______________________________________________
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list