Add the following line to your hbase-env.sh file:
Next, modify the hbase-site.xml file. The settings in this file over-write those in hbase-default.xml, so if you want to see a list of available settings to configure, then study that file, but only make changes to your hbase-site.xml. Add the following settings to hbase-site.xml:
It is imperative that you replace the ‘X’ with ’0′, on the first node in your quorum, ’1′ on the second, ’2′ on the third and so on. This file allows the node to identify itself in the zk quorum.
Once all that per node work is done, you can finally start your hbase instance. From the master /hadoop/hbase directory run:
export JAVA_HOME=/usr/lib/jvm/java-6-sun
In the same file change this line:
export HBASE_OPTS="-XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
to look like:
export HBASE_OPTS="-XX:+HeapDumpOnOutOfMemoryError 
-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode 
-Djava.net.preferIPv4Stack=true"
Now, modify your “regionservers” file to list all of the machines you
 want to host regions. Think of an Hbase region as a small chunk of the 
data in your database. The more regionservers you have, the more data 
you can reliably serve. In my cluster, the regionservers are the same 
nodes as all of my datanodes, and all of my tasktrackers. So, 
essentially, the “regionservers” file should be identical to your 
“slaves” file from the hadoop tutorial.Next, modify the hbase-site.xml file. The settings in this file over-write those in hbase-default.xml, so if you want to see a list of available settings to configure, then study that file, but only make changes to your hbase-site.xml. Add the following settings to hbase-site.xml:
    hbase.rootdir 
    hdfs://$master$/hbase 
    hbase.cluster.distributed 
    true 
    hbase.zookeeper.quorum 
    $slave1$,$slave2$,$slave3$ 
    hbase.zookeeper.property.dataDir 
    /hadoop/zookeeper/data 
Please remember to replace $master$ and $slaveX$ with your master and
 slave host names respectively. You may have read that Hbase 0.20 now 
requires zookeeper, but fear not, the above configuration directives 
allow hbase to completely manage zookeper on it’s own, you never have to
 mess with it. Now, it is typically recommended to always run zookeeper 
on dedicated zookeeper only servers. If you are running a small cluster,
 then this is hardly efficient, because you want as many nodes “working”
 as possible. While I can’t give you recommendations of the maximum 
cluster size you can have before requiring dedicated zk nodes, I can 
tell you that my 6 slave nodes run datanode, tasktracker, regionserver, 
and zookeeper without too much of a problem. I would imagine that if you
 have over 10 nodes in your cluster, then you shouldn’t have a problem 
dedicating a few for zookeeper. They also recommend (maybe even require)
 that zookeeper runs on an odd number of machines. I don’t completely 
understand how zookeeper works, but basically as long as you still have 
more than half of your “quorum” in tact, then your cluster won’t fail. 
In essence, if your zk quorum has 7 nodes, you can lose 3 nodes without 
any adverse affects, a 35 node quorum could theoretically lose 17 nodes,
 and still operate. I think basically zookeeper is used to keep track of
 the locations of regions, so your quorum will notify any clients, and 
fellow regionservers where to find the data they are looking for. If zk 
becomes overloaded, then your regionservers can time out and crash, and 
potentially lose data if they haven’t flushed to disk yet. So make sure 
you have enough horsepower for your application. In my cluster, the 
hbase.zookeeper.quorum directive is simply a comma separated list of all
 of my slave nodes, including my master. If you have an odd number of 
slaves (even number counting your master), then just leave the master 
out of the list. If you have more than ten slaves, then consider 
dedicating 3 of them to zookeeper if you have problems with 
regionservers timing out. The logs will tell you if that is the case.mkdir -p /hadoop/zookeeper/data && echo 'X' > /hadoop/zookeeper/data/myid
It is imperative that you replace the ‘X’ with ’0′, on the first node in your quorum, ’1′ on the second, ’2′ on the third and so on. This file allows the node to identify itself in the zk quorum.
Once all that per node work is done, you can finally start your hbase instance. From the master /hadoop/hbase directory run:
bin/start-hbase.sh
 
No comments:
Post a Comment