Wednesday 4 January 2012

Hadoop: How to set up a single node: 
Simple operations to use Hadoop MapReduce and the Hadoop Distributed File System (HDFS).

Prerequisites
GNU/Linux is supported as a development and production platform. Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes.
Win32 is supported as a development platform. Distributed operation has not been well tested on Win32, so it is not supported as a production platform.

$ sudo apt-get install ssh
$ sudo apt-get install rsync

Installation
Download a recent stable release from one of the Apache Download Mirrors.

Edit the file conf/hadoop-env.sh and point JAVA_HOME to your JAVA root installation. How to know where is your JAVA ($ echo $JAVA_HOME)

Pseudo-Distributed Operation
Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process. Follow this link.

In one of the steps you will need to ssh to the localhost:
$ ssh localhost

If you get the error
$ ssh: connect to host localhost port 22: Connection refused


Then try:
$ sudo apt-get install openssh-server

No comments:

Post a Comment