Hi Siddhartha,<br> you could use a Hadoop map reduce to solve your problem. <br><br>In Map-reduce, your code will be part of MAP and you can use default reduce .. Ii is easy to use.  <br>If you want quicker solution do not use hadoop api but use python and pipes concept.<br>

<br>pls refer to this tutorial<br><a href="http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/">http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/</a><br>

<br><br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

Date: Sat, 21 May 2011 15:14:15 -0700<br>

From: Siddhartha Jonnalagadda <<a href="mailto:sid.kgp@gmail.com">sid.kgp@gmail.com</a>><br>

Subject: [Corpora-List] Simple instructions to scale a java<br>

        application?<br>

To: corpora <<a href="mailto:corpora@uib.no">corpora@uib.no</a>><br>

<br>

I have a single threaded java (NLP) application that processes 1000<br>

sentences in 1 hour. I obviously can't wait for 1000 hours to process<br>

million sentences. Are there any simple instructions to make my program run<br>

in 100 servers at a time? This involves migrating the project workspace into<br>

each of them (or create them from a snapshot that contains it) and<br>

concatenate the output that each server produces.<br>

<br>

Any quick pointers, please? I spent couple of hours browsing through Amazon<br>

MapReduce documentation, but that didn't take me as far...<br>

<br>

Since I don't own shares in Amazon, I am open to non-Amazon solutions too.<br>

<br>

Sincerely,<br>

Siddhartha Jonnalagadda,<br>

Text mining Researcher, Lnx Research, LLC, Orange, CA<br>

<a href="http://sjonnalagadda.wordpress.com" target="_blank">sjonnalagadda.wordpress.com</a><br>

<br>

<br>

Confidentiality Notice:<br>

<br>

This e-mail message, including any attachments, is for the sole use of the<br>

intended recipient(s) and may contain confidential and privileged<br>

information. Any unauthorized review, use, disclosure or distribution is<br>

prohibited. If you are not the intended recipient, please contact the sender<br>

by reply e-mail and destroy all copies of the original message.<br>

-------------- next part --------------<br>

A non-text attachment was scrubbed...<br>

Name: not available<br>

Type: text/html<br>

Size: 3175 bytes<br>

Desc: not available<br>

URL: <<a href="http://www.uib.no/mailman/public/corpora/attachments/20110521/0909399b/attachment.txt" target="_blank">http://www.uib.no/mailman/public/corpora/attachments/20110521/0909399b/attachment.txt</a>><br>

<br>

<br>

</blockquote></div><br><br clear="all"><br>-- <br>Ashish Almeida<br>--<span></span><span></span>-------------------------------<br>