<font face="verdana,sans-serif">Thanks Ashish, Ted, Miles and others for the instructions and suggestions on this topic.<br><br>I finished reading the Hadoop book and read the tutorial by Michael Noll.<br><br></font>I was able to reproduce the results in python (<a href="http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/#run-the-mapreduce-job" rel="nofollow" target="_blank">http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/#run-the-mapreduce-job</a>), but created a Mapper and Reducer in Java since all my code is currently in Java.<br>
I first tried this:<br>
echo “foo foo quux labs foo bar quux” |java -cp ~/dummy.jar WCMapper | sort | java -cp ~/dummy.jar WCReducer
<p>It gave the correct output:<br>
labs 1<br>
foo 3<br>
bar 1<br>
quux 2</p>
<p>Then, I installed a single-node cluster in hadoop and tried this:
hadoop jar contrib/streaming/hadoop-streaming-0.20.203.0.jar -mapper
“java -cp ~/dummy.jar WCMapper” -reducer “java -cp ~/dummy.jar
WCReducer” -input gutenberg/* -output gutenberg-output -file dummy.jar
(by tailoring the python command)</p>
<p>This is the error:<br>
hadoop@siddhartha-laptop:/usr/local/hadoop$ hadoop jar
contrib/streaming/hadoop-streaming-0.20.203.0.jar -mapper “java -cp
~/dummy.jar WCMapper” -reducer “java -cp ~/dummy.jar WCReducer” -input
gutenberg/* -output gutenberg-output -file dummy.jar<br>
packageJobJar: [dummy.jar,
/app/hadoop/tmp/hadoop-unjar5573454211442575176/] []
/tmp/streamjob6721719460213928092.jar tmpDir=null<br>
11/06/04 20:47:15 INFO mapred.FileInputFormat: Total input paths to process : 3<br>
11/06/04 20:47:15 INFO streaming.StreamJob: getLocalDirs(): [/app/hadoop/tmp/mapred/local]<br>
11/06/04 20:47:15 INFO streaming.StreamJob: Running job: job_201106031901_0039<br>
11/06/04 20:47:15 INFO streaming.StreamJob: To kill this job, run:<br>
11/06/04 20:47:15 INFO streaming.StreamJob:
/usr/local/hadoop/bin/../bin/hadoop job
-Dmapred.job.tracker=localhost:54311 -kill job_201106031901_0039<br>
11/06/04 20:47:15 INFO streaming.StreamJob: Tracking URL: <a href="http://localhost:50030/jobdetails.jsp?jobid=job_201106031901_0039" rel="nofollow" target="_blank">http://localhost:50030/jobdetails.jsp?jobid=job_201106031901_0039</a><br>
11/06/04 20:47:16 INFO streaming.StreamJob: map 0% reduce 0%<br>
11/06/04 20:48:00 INFO streaming.StreamJob: map 100% reduce 100%<br>
11/06/04 20:48:00 INFO streaming.StreamJob: To kill this job, run:<br>
11/06/04 20:48:00 INFO streaming.StreamJob:
/usr/local/hadoop/bin/../bin/hadoop job
-Dmapred.job.tracker=localhost:54311 -kill job_201106031901_0039<br>
11/06/04 20:48:00 INFO streaming.StreamJob: Tracking URL: <a href="http://localhost:50030/jobdetails.jsp?jobid=job_201106031901_0039" rel="nofollow" target="_blank">http://localhost:50030/jobdetails.jsp?jobid=job_201106031901_0039</a><br>
11/06/04 20:48:00 ERROR streaming.StreamJob: Job not successful. Error: NA<br>
11/06/04 20:48:00 INFO streaming.StreamJob: killJob…<br>
Streaming Job Failed!</p>
<p><b>Any advice?</b></p><font face="verdana,sans-serif">I also tried in vain - hadoop jar contrib/streaming/hadoop-streaming-0.20.203.0.jar -file dummy.jar -mapper "java -cp dummy.jar WCMapper" -reducer "java -cp dummy.jar WCReducer" -input gutenberg/* -output gutenberg-output<br>
<br><br clear="all"></font><span style="font-family: verdana,sans-serif;">Sincerely,</span><br style="font-family: verdana,sans-serif;"><span style="font-family: verdana,sans-serif;">Siddhartha Jonnalagadda, </span><br style="font-family: verdana,sans-serif;">
<span style="font-family: verdana,sans-serif;"></span><span style="font-family: verdana,sans-serif;"></span><a style="font-family: verdana,sans-serif;" href="http://sjonnalagadda.wordpress.com" target="_blank">sjonnalagadda.wordpress.com</a><br style="font-family: verdana,sans-serif;">
<br style="font-family: verdana,sans-serif;"><span style="border-collapse: separate; border-spacing: 0px; color: rgb(0, 0, 0); font-family: verdana,sans-serif; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;"><div style="word-wrap: break-word;">
<span style="border-collapse: separate; border-spacing: 0px; color: rgb(0, 0, 0); font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px;"><p style="margin: 0px; font-style: normal; font-variant: normal; font-weight: normal; line-height: normal; font-size-adjust: none; font-stretch: normal; min-height: 14px;">
<font size="1"><br></font></p><p style="margin: 0px; font-family: arial narrow,sans-serif;"><font style="font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; line-height: normal; font-size-adjust: none; font-stretch: normal;" size="1">Confidentiality Notice:</font></p>
<p style="margin: 0px; font-family: arial narrow,sans-serif;"><font style="font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; line-height: normal; font-size-adjust: none; font-stretch: normal;" size="1">This
e-mail message, including any attachments, is for the sole use of the
intended recipient(s) and may contain confidential and privileged
information. Any unauthorized review, use, disclosure or distribution is
prohibited. If you are not the intended recipient, please contact the
sender by reply e-mail and destroy all copies of the original message.</font></p><br></span></div></span> <br style="font-family: verdana,sans-serif;"><br>
<br><br><div class="gmail_quote">On Sun, May 22, 2011 at 3:33 AM, Miles Osborne <span dir="ltr"><<a href="mailto:miles@inf.ed.ac.uk" target="_blank">miles@inf.ed.ac.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
There really is no need to reinvent the wheel. If you want to easily<br>
scale your task then just use Hadoop. Installing it is easy. Look at<br>
the "streaming" interface which will allow you to call your code<br>
directly, without any special libraries etc.<br>
<br>
To give you a feel for how easy it is, this would be the command<br>
(assuming your job is called "parser" and you have loaded your data<br>
onto Hadoop already)<br>
<br>
hadoop jar /usr/local/share/hadoop/contrib/streaming/hadoop-*-streaming.jar<br>
-mapper parser -input myData/* -output myDataOut -file parser -<br>
-numReduceTasks 0<br>
<br>
and that is it. not hard as you can see.<br>
<br>
(home brew approaches are not robust and that is the real magic behind<br>
map Reduce)<br>
<font color="#888888"><br>
Miles<br>
</font><div><br>
--<br>
The University of Edinburgh is a charitable body, registered in<br>
Scotland, with registration number SC005336.<br>
<br>
</div><div><div></div><div>_______________________________________________<br>
UNSUBSCRIBE from this page: <a href="http://mailman.uib.no/options/corpora" target="_blank">http://mailman.uib.no/options/corpora</a><br>
Corpora mailing list<br>
<a href="mailto:Corpora@uib.no" target="_blank">Corpora@uib.no</a><br>
<a href="http://mailman.uib.no/listinfo/corpora" target="_blank">http://mailman.uib.no/listinfo/corpora</a><br>
</div></div></blockquote></div><br>