We are working on a project which allows to deploy GATE or UIMA applications on a cluster and do very large scale text analysis. As Linas said, what you mean by "corpus analysis" would need to be refined a bit. Our project is based on Apache resources (Hadoop - Tika - UIMA) and will be available under an open source licence. It is at an early stage of developement and we would be interested in hearing about potential users as their use cases would help with the design of the application. Feel free to get in touch if you think that this could be relevant<br>

<br>Julien Nioche<br>-- <br>DigitalPebble Ltd<br><a href="http://www.digitalpebble.com">http://www.digitalpebble.com</a><br><br><br><div class="gmail_quote">2009/11/4 Spruiell, William C <span dir="ltr"><<a href="mailto:sprui1wc@cmich.edu">sprui1wc@cmich.edu</a>></span><br>

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


<div link="blue" vlink="purple" lang="EN-US">


<div>


<p class="MsoNormal">Are there any available corpus analysis tools that work by “farming”

texts out to client programs on multiple computers (workstation cluster, beowulf,

or just widely distributed)  and then collating the results (like the

screensaver freeware that the SETI project distributed so that anyone

interested could volunteer to do some of their signal analysis for them)?</p>


<p class="MsoNormal"> </p>


<p class="MsoNormal">Thanks,</p>


<p class="MsoNormal"> </p>


<p class="MsoNormal">Bill Spruiell</p>


<p class="MsoNormal">Dept. of English</p>


<p class="MsoNormal">Central Michigan University</p>


</div>


</div>


<br>_______________________________________________<br>

Corpora mailing list<br>

<a href="mailto:Corpora@uib.no">Corpora@uib.no</a><br>

<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>

<br></blockquote></div><br>