Techie Talks: Hadoop installation in Pseudo distributed mode tutorial

This document covers the Steps to

1) Configure SSH

2) Install JDK

3) Install Hadoop

Update your repository

#sudo apt-get update

Hadoop use SSH to prove the identity for connection.

Let's Download and configure SSH

#sudo apt-get install openssh-server openssh-client

#ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

#cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

#sudo chmod go-w $HOME $HOME/.ssh

#sudo chmod 600 $HOME/.ssh/authorized_keys

#sudo chown `whoami` $HOME/.ssh/authorized_keys

Testing your SSH

#ssh localhost

Say yes

It should open connection with SSH

#exit

This will close the SSH

Java 1.6 is mandatory for running hadoop

Lets Download and install JDK

#sudo mkdir /usr/java

#cd /usr/java

#sudo wget http://download.oracle.com/otn-pub/java/jdk/6u31-b04/jdk-6u31-linux-i586.bin

Wait till the jdk download completes

Install java

#sudo chmod o+w jdk-6u31-linux-i586.bin

#sudo chmod +x jdk-6u31-linux-i586.bin

#sudo ./jdk-6u31-linux-i586.bin

Now comes the Hadoop :)

Lets Download and configure Hadoop in Pseudo distributed mode. You can read more about various types of modes on Hadoop website.

Download the latest hadoop version from its website

http://hadoop.apache.org/common/releases.html

Download hadoop 1.0.x tar.gz from hadoop website

Extract it into some folder ( say /home/hadoop/software/20/ )

All softwares have been downloaded at that location

Go to conf directory in hadoop folder and open core-site.xml and add the following property in blank configuration tags

<name>fs.default.name</name>

<value>hdfs://localhost</value>

</property>

</configuration>

Similarly do for

conf/hdfs-site.xml:

<name>dfs.replication</name>

</property>

</configuration>

conf/mapred-site.xml:

<name>mapred.job.tracker</name>

<value>localhost:8021</value>

</property>

</configuration>

Environment variables

In hadoop_env.sh file , change the JAVA_HOME to location where you installed java

e.g

JAVA_HOME = /usr/java/jdk1.6.0_31

Configure the environment variables for JDK , Hadoop as follows

Go to ~.profile file in the current user home directory

Add the following

You can change the variable paths if you have installed hadoop and java at some other locations

export JAVA_HOME="/usr/java/jdk1.6.0_31"

export PATH=$PATH:$JAVA_HOME/bin

export HADOOP_INSTALL="/home/hadoop/software/hadoop-1.0.1"

export PATH=$PATH:$HADOOP_INSTALL/bin

Testing your installation

Format the HDFS

# hadoop namenode -format

hadoop@jj-VirtualBox:~$ start-dfs.sh

starting namenode, logging to /home/hadoop/software/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-namenode-jj-VirtualBox.out

localhost: starting datanode, logging to /home/hadoop/software/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-datanode-jj-VirtualBox.out

localhost: starting secondarynamenode, logging to /home/hadoop/software/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-secondarynamenode-jj-VirtualBox.out

hadoop@jj-VirtualBox:~$ start-mapred.sh

starting jobtracker, logging to /home/hadoop/software/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-jobtracker-jj-VirtualBox.out

localhost: starting tasktracker, logging to /home/hadoop/software/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-tasktracker-jj-VirtualBox.out

Open the browser and point to page

localhost:50030

localhost:50070

It would open the status page for hadoop

Thats it , this completes the installation of Hadoop , now you are ready to play with it.

Techie Talks

Monday, 4 June 2012

Hadoop installation in Pseudo distributed mode tutorial

No comments:

Post a Comment