[Corpora-List] Simple instructions to scale a java application?

Siddhartha Jonnalagadda sid.kgp at gmail.com
Sun Jun 5 20:26:07 UTC 2011


I believe this issue is resolved with hit and trial.
Had to replace -file dummy.jar with -files dummy.jar

Sincerely,
Siddhartha Jonnalagadda,
sjonnalagadda.wordpress.com


Confidentiality Notice:

This e-mail message, including any attachments, is for the sole use of the
intended recipient(s) and may contain confidential and privileged
information. Any unauthorized review, use, disclosure or distribution is
prohibited. If you are not the intended recipient, please contact the sender
by reply e-mail and destroy all copies of the original message.





On Sat, Jun 4, 2011 at 8:54 PM, Siddhartha Jonnalagadda
<sid.kgp at gmail.com>wrote:

> Thanks Ashish, Ted, Miles and others for the instructions and suggestions
> on this topic.
>
> I finished reading the Hadoop book and read the tutorial by Michael Noll.
>
> I was able to reproduce the results in python (
> http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/#run-the-mapreduce-job),
> but created a Mapper and Reducer in Java since all my code is currently in
> Java.
> I first tried this:
> echo “foo foo quux labs foo bar quux” |java -cp ~/dummy.jar WCMapper | sort
> | java -cp ~/dummy.jar WCReducer
>
> It gave the correct output:
> labs 1
> foo 3
> bar 1
> quux 2
>
> Then, I installed a single-node cluster in hadoop and tried this: hadoop
> jar contrib/streaming/hadoop-streaming-0.20.203.0.jar -mapper “java -cp
> ~/dummy.jar WCMapper” -reducer “java -cp ~/dummy.jar WCReducer” -input
> gutenberg/* -output gutenberg-output -file dummy.jar (by tailoring the
> python command)
>
> This is the error:
> hadoop at siddhartha-laptop:/usr/local/hadoop$ hadoop jar
> contrib/streaming/hadoop-streaming-0.20.203.0.jar -mapper “java -cp
> ~/dummy.jar WCMapper” -reducer “java -cp ~/dummy.jar WCReducer” -input
> gutenberg/* -output gutenberg-output -file dummy.jar
> packageJobJar: [dummy.jar,
> /app/hadoop/tmp/hadoop-unjar5573454211442575176/] []
> /tmp/streamjob6721719460213928092.jar tmpDir=null
> 11/06/04 20:47:15 INFO mapred.FileInputFormat: Total input paths to process
> : 3
> 11/06/04 20:47:15 INFO streaming.StreamJob: getLocalDirs():
> [/app/hadoop/tmp/mapred/local]
> 11/06/04 20:47:15 INFO streaming.StreamJob: Running job:
> job_201106031901_0039
> 11/06/04 20:47:15 INFO streaming.StreamJob: To kill this job, run:
> 11/06/04 20:47:15 INFO streaming.StreamJob:
> /usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:54311
> -kill job_201106031901_0039
> 11/06/04 20:47:15 INFO streaming.StreamJob: Tracking URL:
> http://localhost:50030/jobdetails.jsp?jobid=job_201106031901_0039
> 11/06/04 20:47:16 INFO streaming.StreamJob: map 0% reduce 0%
> 11/06/04 20:48:00 INFO streaming.StreamJob: map 100% reduce 100%
> 11/06/04 20:48:00 INFO streaming.StreamJob: To kill this job, run:
> 11/06/04 20:48:00 INFO streaming.StreamJob:
> /usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:54311
> -kill job_201106031901_0039
> 11/06/04 20:48:00 INFO streaming.StreamJob: Tracking URL:
> http://localhost:50030/jobdetails.jsp?jobid=job_201106031901_0039
> 11/06/04 20:48:00 ERROR streaming.StreamJob: Job not successful. Error: NA
> 11/06/04 20:48:00 INFO streaming.StreamJob: killJob…
> Streaming Job Failed!
>
> *Any advice?*
> I also tried in vain - hadoop jar
> contrib/streaming/hadoop-streaming-0.20.203.0.jar -file dummy.jar -mapper
> "java -cp dummy.jar WCMapper" -reducer "java -cp dummy.jar WCReducer" -input
> gutenberg/* -output gutenberg-output
>
>
> Sincerely,
> Siddhartha Jonnalagadda,
> sjonnalagadda.wordpress.com
>
>
> Confidentiality Notice:
>
> This e-mail message, including any attachments, is for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the sender
> by reply e-mail and destroy all copies of the original message.
>
>
>
>
>
> On Sun, May 22, 2011 at 3:33 AM, Miles Osborne <miles at inf.ed.ac.uk> wrote:
>
>> There really is no need to reinvent the wheel.  If you want to easily
>> scale your task then just use Hadoop.  Installing it is easy.  Look at
>> the "streaming" interface which will allow you to call your code
>> directly, without any special libraries etc.
>>
>> To give you a feel for how easy it is, this would be the command
>> (assuming your job is called "parser" and you have loaded your data
>> onto Hadoop already)
>>
>> hadoop jar
>> /usr/local/share/hadoop/contrib/streaming/hadoop-*-streaming.jar
>> -mapper parser  -input myData/* -output myDataOut -file parser -
>> -numReduceTasks 0
>>
>> and that is it.  not hard as you can see.
>>
>> (home brew approaches are not robust and that is the real magic behind
>> map Reduce)
>>
>> Miles
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>> _______________________________________________
>> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
>> Corpora mailing list
>> Corpora at uib.no
>> http://mailman.uib.no/listinfo/corpora
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110605/73022a9a/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora


More information about the Corpora mailing list