[Corpora-List] Simple instructions to scale a java application?
Siddhartha Jonnalagadda
sid.kgp at gmail.com
Sun Jun 5 03:54:37 UTC 2011
Thanks Ashish, Ted, Miles and others for the instructions and suggestions on
this topic.
I finished reading the Hadoop book and read the tutorial by Michael Noll.
I was able to reproduce the results in python (
http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/#run-the-mapreduce-job),
but created a Mapper and Reducer in Java since all my code is currently in
Java.
I first tried this:
echo “foo foo quux labs foo bar quux” |java -cp ~/dummy.jar WCMapper | sort
| java -cp ~/dummy.jar WCReducer
It gave the correct output:
labs 1
foo 3
bar 1
quux 2
Then, I installed a single-node cluster in hadoop and tried this: hadoop jar
contrib/streaming/hadoop-streaming-0.20.203.0.jar -mapper “java -cp
~/dummy.jar WCMapper” -reducer “java -cp ~/dummy.jar WCReducer” -input
gutenberg/* -output gutenberg-output -file dummy.jar (by tailoring the
python command)
This is the error:
hadoop at siddhartha-laptop:/usr/local/hadoop$ hadoop jar
contrib/streaming/hadoop-streaming-0.20.203.0.jar -mapper “java -cp
~/dummy.jar WCMapper” -reducer “java -cp ~/dummy.jar WCReducer” -input
gutenberg/* -output gutenberg-output -file dummy.jar
packageJobJar: [dummy.jar, /app/hadoop/tmp/hadoop-unjar5573454211442575176/]
[] /tmp/streamjob6721719460213928092.jar tmpDir=null
11/06/04 20:47:15 INFO mapred.FileInputFormat: Total input paths to process
: 3
11/06/04 20:47:15 INFO streaming.StreamJob: getLocalDirs():
[/app/hadoop/tmp/mapred/local]
11/06/04 20:47:15 INFO streaming.StreamJob: Running job:
job_201106031901_0039
11/06/04 20:47:15 INFO streaming.StreamJob: To kill this job, run:
11/06/04 20:47:15 INFO streaming.StreamJob:
/usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:54311
-kill job_201106031901_0039
11/06/04 20:47:15 INFO streaming.StreamJob: Tracking URL:
http://localhost:50030/jobdetails.jsp?jobid=job_201106031901_0039
11/06/04 20:47:16 INFO streaming.StreamJob: map 0% reduce 0%
11/06/04 20:48:00 INFO streaming.StreamJob: map 100% reduce 100%
11/06/04 20:48:00 INFO streaming.StreamJob: To kill this job, run:
11/06/04 20:48:00 INFO streaming.StreamJob:
/usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=localhost:54311
-kill job_201106031901_0039
11/06/04 20:48:00 INFO streaming.StreamJob: Tracking URL:
http://localhost:50030/jobdetails.jsp?jobid=job_201106031901_0039
11/06/04 20:48:00 ERROR streaming.StreamJob: Job not successful. Error: NA
11/06/04 20:48:00 INFO streaming.StreamJob: killJob…
Streaming Job Failed!
*Any advice?*
I also tried in vain - hadoop jar
contrib/streaming/hadoop-streaming-0.20.203.0.jar -file dummy.jar -mapper
"java -cp dummy.jar WCMapper" -reducer "java -cp dummy.jar WCReducer" -input
gutenberg/* -output gutenberg-output
Sincerely,
Siddhartha Jonnalagadda,
sjonnalagadda.wordpress.com
Confidentiality Notice:
This e-mail message, including any attachments, is for the sole use of the
intended recipient(s) and may contain confidential and privileged
information. Any unauthorized review, use, disclosure or distribution is
prohibited. If you are not the intended recipient, please contact the sender
by reply e-mail and destroy all copies of the original message.
On Sun, May 22, 2011 at 3:33 AM, Miles Osborne <miles at inf.ed.ac.uk> wrote:
> There really is no need to reinvent the wheel. If you want to easily
> scale your task then just use Hadoop. Installing it is easy. Look at
> the "streaming" interface which will allow you to call your code
> directly, without any special libraries etc.
>
> To give you a feel for how easy it is, this would be the command
> (assuming your job is called "parser" and you have loaded your data
> onto Hadoop already)
>
> hadoop jar /usr/local/share/hadoop/contrib/streaming/hadoop-*-streaming.jar
> -mapper parser -input myData/* -output myDataOut -file parser -
> -numReduceTasks 0
>
> and that is it. not hard as you can see.
>
> (home brew approaches are not robust and that is the real magic behind
> map Reduce)
>
> Miles
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
> _______________________________________________
> UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
> Corpora mailing list
> Corpora at uib.no
> http://mailman.uib.no/listinfo/corpora
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listserv.linguistlist.org/pipermail/corpora/attachments/20110604/4ddd1574/attachment.htm>
-------------- next part --------------
_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
Corpora at uib.no
http://mailman.uib.no/listinfo/corpora
More information about the Corpora
mailing list