Monday 4 June 2012

Hadoop installation in Pseudo distributed mode tutorial



This document covers the Steps to
1) Configure SSH
2) Install JDK
3) Install Hadoop

Update your repository
#sudo apt-get update

Hadoop use SSH to prove the identity for connection.
Let's Download and configure SSH
#sudo apt-get install openssh-server openssh-client
#ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
#cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
#sudo chmod go-w $HOME $HOME/.ssh
#sudo chmod 600 $HOME/.ssh/authorized_keys
#sudo chown `whoami` $HOME/.ssh/authorized_keys

Testing your SSH
#ssh localhost
Say yes
It should open connection with SSH
#exit
This will close the SSH

Java 1.6 is mandatory for running hadoop
Lets Download and install JDK
#sudo mkdir /usr/java
#cd /usr/java
#sudo wget http://download.oracle.com/otn-pub/java/jdk/6u31-b04/jdk-6u31-linux-i586.bin

Wait till the jdk download completes
Install java
#sudo chmod o+w jdk-6u31-linux-i586.bin
#sudo chmod +x jdk-6u31-linux-i586.bin
#sudo ./jdk-6u31-linux-i586.bin

Now comes the Hadoop :)
Lets Download and configure Hadoop in Pseudo distributed mode. You can read more about various types of modes on Hadoop website.
Download the latest hadoop version from its website
http://hadoop.apache.org/common/releases.html

Download hadoop 1.0.x tar.gz from hadoop website
Extract it into some folder ( say /home/hadoop/software/20/ )

All softwares have been downloaded at that location


Go to conf directory in hadoop folder and open core-site.xml and add the following property in blank configuration tags

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost</value>
</property>
</configuration>

Similarly do for

conf/hdfs-site.xml:

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>


conf/mapred-site.xml:

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
</configuration>

Environment variables
In hadoop_env.sh file , change the JAVA_HOME to location where you installed java
e.g

JAVA_HOME = /usr/java/jdk1.6.0_31

Configure the environment variables for JDK , Hadoop as follows
Go to ~.profile file in the current user home directory
Add the following
You can change the variable paths if you have installed hadoop and java at some other locations

export JAVA_HOME="/usr/java/jdk1.6.0_31"
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_INSTALL="/home/hadoop/software/hadoop-1.0.1"
export PATH=$PATH:$HADOOP_INSTALL/bin

Testing your installation
Format the HDFS
# hadoop namenode -format

hadoop@jj-VirtualBox:~$ start-dfs.sh
starting namenode, logging to /home/hadoop/software/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-namenode-jj-VirtualBox.out

localhost: starting datanode, logging to /home/hadoop/software/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-datanode-jj-VirtualBox.out

localhost: starting secondarynamenode, logging to /home/hadoop/software/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-secondarynamenode-jj-VirtualBox.out

hadoop@jj-VirtualBox:~$ start-mapred.sh

starting jobtracker, logging to /home/hadoop/software/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-jobtracker-jj-VirtualBox.out

localhost: starting tasktracker, logging to /home/hadoop/software/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-tasktracker-jj-VirtualBox.out

Open the browser and point to page
localhost:50030
localhost:50070
It would open the status page for hadoop
Thats it , this completes the installation of Hadoop , now you are ready to play with it.

No comments:

Post a Comment