[Corpora-List] Amazon MapReduce instructions for a simple java functionality

Wed May 25 21:58:18 UTC 2011

Hi All,

Thanks for confirming that MapReduce is the way to go and the tutorials! I
was trying to go through some of the tutorials, but they lack specific
details about using a java project. So, I changed my question. Please excuse
me if you consider this discussion inappropriate for this list and ignore
the rest. I thought this is a problem that most of us would be facing. Java
is the most popular language for NLP (
http://nlpers.blogspot.com/2009/03/programming-language-of-choice.html) and
we all need to map to clusters and reduce our processing. Further, Amazon
servers is the way to go for many that don't have access to personal HPC
clusters.

Wondering if someone could help me with precise instructions to use Amazon
MapReduce for the simple java program below? It has one class that takes an
input, has a dictionary and produces an output. (Basically whatever is in
input, if it is present in dictionary) I would use that as a template for my
java application. I need mapreduce I want to decrease the time taken for a
complex application by n-fold.

I'm kind of lost trying to learn different things. It is easier to do it the
other way, I guess. Someone, please?

Here is the tested code:
http://dl.dropbox.com/u/6777654/Simple.zip

I greatly appreciate you spending 5-10 minutes in giving simple instructions
that a java programmer with knowledge of MapReduce and familiarity with
Amazon servers could use.

Thanks.

Sincerely,
Siddhartha Jonnalagadda,
Text mining Researcher, Lnx Research, LLC, Orange, CA
sjonnalagadda.wordpress.com

Confidentiality Notice:

This e-mail message, including any attachments, is for the sole use of the
intended recipient(s) and may contain confidential and privileged
information. Any unauthorized review, use, disclosure or distribution is
prohibited. If you are not the intended recipient, please contact the sender
by reply e-mail and destroy all copies of the original message.

On Sat, May 21, 2011 at 3:14 PM, Siddhartha Jonnalagadda
<sid.kgp at gmail.com>wrote:

> I have a single threaded java (NLP) application that processes 1000
> sentences in 1 hour. I obviously can't wait for 1000 hours to process
> million sentences. Are there any simple instructions to make my program run
> in 100 servers at a time? This involves migrating the project workspace into
> each of them (or create them from a snapshot that contains it) and
> concatenate the output that each server produces.
>
> Any quick pointers, please? I spent couple of hours browsing through Amazon
> MapReduce documentation, but that didn't take me as far...
>
> Since I don't own shares in Amazon, I am open to non-Amazon solutions too.
>
> Sincerely,
> Siddhartha Jonnalagadda,
> Text mining Researcher, Lnx Research, LLC, Orange, CA
> sjonnalagadda.wordpress.com
>
>
> Confidentiality Notice:
>
> This e-mail message, including any attachments, is for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the sender
> by reply e-mail and destroy all copies of the original message.
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110525/83a27ef8/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora