<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
<title></title>
</head>
<body text="#000000" bgcolor="#ffffff">
Irina, <br>
<br>
I am not sure if this helps you, but I have extracted the text for
the English version of Wikipedia (in April of this year)<br>
using the
<meta http-equiv="content-type" content="text/html;
charset=ISO-8859-1">
<a target="n"
href="http://medialab.di.unipi.it/wiki/Wikipedia_Extractor">WikiExtractor</a>
toolset and created a 990 million word corpus that is freely
available on my web site:<br>
<br>
<a class="moz-txt-link-freetext"
href="http://www.psych.ualberta.ca/%7Ewestburylab/downloads/westburylab.wikicorp.download.html">http://www.psych.ualberta.ca/~westburylab/downloads/westburylab.wikicorp.download.html</a><br>
<br>
Yours, <br>
<br>
Cyrus<br>
<br>
<pre class="moz-signature" cols="72">--
=[=]={=}=[=]={=}=[=]={=}=[=]={=}=[=]={=}
Cyrus Shaoul
<a class="moz-txt-link-freetext" href="http://www.psych.ualberta.ca/%7Ewestburylab/">http://www.psych.ualberta.ca/~westburylab/</a>
University of Alberta
=[=]={=}=[=]={=}=[=]={=}=[=]={=}=[=]={=}
</pre>
</body>
</html>