[Corpora-List] ask for very large, well-balanced corpus

Lushan Han lushan1 at umbc.edu
Mon Jul 16 19:10:47 UTC 2012


Dear all,

Does anyone know where or how I can get a well-balanced corpus of modern
English, such as BNC, but with a much larger size? I hope it can have at
least 1 billion words. I tried to assemble a corpus from Wikipedia articles
but it turned out that such a corpus is not balanced. Wikipedia contains
many repetitions of the same type of articles, for example, films or birds.

A Web corpus should be okay for my purpose as long as it was harvested
from balanced domains.


Thanks,

Lushan Han
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20120716/863e996d/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list