<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

    <title></title>

  </head>

  <body text="#000000" bgcolor="#ffffff">

    Irina, <br>

    <br>

    I am not sure if this helps you, but I have extracted the text for

    the English version of Wikipedia (in April of this year)<br>

    using the

    <meta http-equiv="content-type" content="text/html;

      charset=ISO-8859-1">

    <a target="n"

      href="http://medialab.di.unipi.it/wiki/Wikipedia_Extractor">WikiExtractor</a>

    toolset and created a 990 million word corpus that is freely

    available on my web site:<br>

    <br>

    <a class="moz-txt-link-freetext"

href="http://www.psych.ualberta.ca/%7Ewestburylab/downloads/westburylab.wikicorp.download.html">http://www.psych.ualberta.ca/~westburylab/downloads/westburylab.wikicorp.download.html</a><br>

    <br>

    Yours, <br>

    <br>

    Cyrus<br>

    <br>

    <pre class="moz-signature" cols="72">-- 

=[=]={=}=[=]={=}=[=]={=}=[=]={=}=[=]={=}

Cyrus Shaoul

<a class="moz-txt-link-freetext" href="http://www.psych.ualberta.ca/%7Ewestburylab/">http://www.psych.ualberta.ca/~westburylab/</a>

University of Alberta

=[=]={=}=[=]={=}=[=]={=}=[=]={=}=[=]={=}

</pre>

  </body>

</html>