This document covers the Steps to
1) Configure SSH
2) Install JDK
3) Install Hadoop
Update your repository
#sudo apt-get update
Let's Download and configure SSH
#sudo apt-get install openssh-server openssh-client
#ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
#cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
#sudo chmod go-w $HOME $HOME/.ssh
#sudo chmod 600 $HOME/.ssh/authorized_keys
#sudo chown `whoami` $HOME/.ssh/authorized_keys
Testing your SSH
#ssh localhost
Say yes
It should open connection with SSH
#exit
This will close the SSH
Java 1.6 is mandatory for running hadoop
Lets Download and install JDK
#sudo mkdir /usr/java
#cd /usr/java
#sudo wget http://download.oracle.com/otn-pub/java/jdk/6u31-b04/jdk-6u31-linux-i586.bin
Wait till the jdk download completes
Install java
#sudo chmod o+w jdk-6u31-linux-i586.bin
#sudo chmod +x jdk-6u31-linux-i586.bin
#sudo ./jdk-6u31-linux-i586.bin
Now comes the Hadoop :)
Lets Download and configure Hadoop in Pseudo distributed mode. You can read more about various types of modes on Hadoop website.
Download the latest hadoop version from its website
http://hadoop.apache.org/common/releases.html
Download hadoop 1.0.x tar.gz from hadoop website
Extract it into some folder ( say /home/hadoop/software/20/ )
All softwares have been downloaded at that location
Go to conf directory in hadoop folder and open core-site.xml and add the following property in blank configuration tags
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost</value>
</property>
</configuration>
Similarly do for
conf/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
conf/mapred-site.xml:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:8021</value>
</property>
</configuration>
Environment variables
In hadoop_env.sh file , change the JAVA_HOME to location where you installed java
e.g
JAVA_HOME = /usr/java/jdk1.6.0_31
Configure the environment variables for JDK , Hadoop as follows
Go to ~.profile file in the current user home directory
Add the following
You can change the variable paths if you have installed hadoop and java at some other locations
export JAVA_HOME="/usr/java/jdk1.6.0_31"
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_INSTALL="/home/hadoop/software/hadoop-1.0.1"
export PATH=$PATH:$HADOOP_INSTALL/bin
Testing your installation
Format the HDFS
# hadoop namenode -format
hadoop@jj-VirtualBox:~$ start-dfs.sh
starting namenode, logging to /home/hadoop/software/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-namenode-jj-VirtualBox.out
localhost:
starting datanode, logging to
/home/hadoop/software/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-datanode-jj-VirtualBox.out
localhost:
starting secondarynamenode, logging to
/home/hadoop/software/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-secondarynamenode-jj-VirtualBox.out
hadoop@jj-VirtualBox:~$ start-mapred.sh
starting jobtracker, logging to /home/hadoop/software/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-jobtracker-jj-VirtualBox.out
localhost:
starting tasktracker, logging to
/home/hadoop/software/hadoop-1.0.1/libexec/../logs/hadoop-hadoop-tasktracker-jj-VirtualBox.out
Open the browser and point to page
localhost:50030
localhost:50070
It would open the status page for hadoop
Thats it , this completes the installation of Hadoop , now you are ready to play with it.
No comments:
Post a Comment