Wednesday 4 January 2012

Hadoop: Example Program:
In this example by using HadoopStreaming you are able to interact with Hadoop. HadooopStreaming is an API that allows you to write code in any language and use a simple text-based record format for the input and output <key, value> pairs.

Hadoop example program.

In the above link there is a tutorial that shows how to write a program in Python and then it shows how to make it run through Hadoop.

When you reach the part in which you need to put your files in HDFS don't follow the instructions in the tutorial. Instead do the following:

$ bin/hadoop fs -mkdir urls
$ bin/hadoop fs -put url1 urls/
$ bin/hadoop fs -put url2 urls/
$ bin/hadoop jar contrib/streaming/hadoop-0.20.2-streaming.jar \
      -mapper $HOME/proj/hadoop/multifetch.py \
      -reducer $HOME/proj/hadoop/reducer.py     \
      -input urls/*                                                    \
      -output titles

No comments:

Post a Comment