Sunday 9 December 2012

Some tips for big data cluster

To disable the iptables in linux box

service iptables status
service iptables save
service iptables stop
chkconfig iptables off
service network restart

To leave the safe mode
hadoop dfsadmin -safemode leave

To stop a daemon of hbase
hbase-daemon.sh stop master

To kill a job
hadoop job -kill job_201209271339_0006

install java jdk rpm

rpm -Uvh jdk1.6.0_33-linux-i586.rpm

Uninstall java
rpm -qa ! grep jdk
rpm -e jdk1.6.0_33-fcs

To view task log url ,replace taskid by attemptid in the log url.

Common errors
UserPriviledgedAction:Give chown rights to hduser
session 0*0 for server null :network issue
Clock sync error:set same time in master n all slaves to synchronise the clocks
Unable to read additional data from clientsessionid:Error comes when slaves are removed and data is not replicated properly.Add the slaves back to recover data.

Create Auxlib in hive and copy these jar files in that
Zookeeper
hive contrib
hbase jar
hbase hive jar
guava jar


For talend opensource many jar files are missing and they can be taken from talend data integration tool.

For talend to connect to hive hive-site.xml and hive-defaullt.xml should be on classpath.For talend or jaspersoft to work,start the thrift server using command

hive --service hiveserver

Add these lines in bashrc to set the classpath

for i in hivehome/lib/*.jar;do
classpath=$classpath:$i
done
classpath=hadoophome/hadoopcore jar
classpath=hivehome/conf

In talend mysql java connector was missing and it requires 5.0 version only to be added to plugins.Go to modules add it and refresh.

HBase connection to talend was noot happening coz localhost could not b resolved.Go to c:/windows/system32/drivers/etc/hosts and add these lines

127.0.0.1 localhost
IP  master
IP1 slave1
IP2 slave2

similarly add hosts entry of ur system in master and slaves

hbase connection is case sensitive and will throw null ptr exception if case is not taken into consideration.

namenode goes for safe mode even before jobtracker so add
dfs.namenode.threshold.percent=0 so that namenode doesnt go for safe mode.

while writing a hive load statement use fields terminated by clause to avoid adding equal no. of null columns.

copying data from eventlog to hbase hive integration creates reducer size to zero as some rowkeys comes a s null.

delete temp files to remove any data that might cause hindrance in starting the cluster.

To ssh without using password do this in root
$ chmod go-w $HOME $HOME/.ssh
$ chmod 600 $HOME/.ssh/authorized_keys


and go to hduser
$ chown `whoami` $HOME/.ssh/authorized_keys

change machine name to slave1,slave2 by
echo slave1 > /proc/sys/kernel/hostname


The most common problem that causes public keys to fail are permissions in the $HOME directory. Your $HOME directory cannot be writable by any user except the owner. Additionally, the .ssh directory and the authorized_keys file cannot be writable except by the owner. The ssh protocol will not report the problem but will silently ignore the authorized_keys file if any permissions are wrong.
To fix the destination public key handshake, you can do this (logged in as the remote user):
    chmod 755 $HOME $HOME/.ssh
    chmod 600 $HOME/.ssh/*

Alternatively, you can just remove the write capability with:
chmod go-w $HOME $HOME/.ssh
chmod go-w $HOME/.ssh/*
Also, the $HOME and $HOME/.ssh directories must be owned by the user and all the files in .ssh owned by the user. A common error is to create the .ssh directory and files as root and forget to assign the proper permissions and ownership. A better way is to login as the user, then run ssh-keygen -t to create not only the ssh keys but the .ssh directory with correct permissions and ownership.

10 comments:

  1. There are lots of information about latest technology and how to get trained in them, like Hadoop Training Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies(Hadoop Training in Chennai). By the way you are running a great blog. Thanks for sharing this.

    Hadoop training institutes in chennai | Hadoop Training Chennai

    ReplyDelete


  2. This is excellent information. It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...
    Android Training in Chennai
    Ios Training in Chennai

    ReplyDelete