Add the following line to your hbase-env.sh file:
Next, modify the hbase-site.xml file. The settings in this file over-write those in hbase-default.xml, so if you want to see a list of available settings to configure, then study that file, but only make changes to your hbase-site.xml. Add the following settings to hbase-site.xml:
It is imperative that you replace the ‘X’ with ’0′, on the first node in your quorum, ’1′ on the second, ’2′ on the third and so on. This file allows the node to identify itself in the zk quorum.
Once all that per node work is done, you can finally start your hbase instance. From the master /hadoop/hbase directory run:
export JAVA_HOME=/usr/lib/jvm/java-6-sun
In the same file change this line:
export HBASE_OPTS="-XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
to look like:
export HBASE_OPTS="-XX:+HeapDumpOnOutOfMemoryError
-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
-Djava.net.preferIPv4Stack=true"
Now, modify your “regionservers” file to list all of the machines you
want to host regions. Think of an Hbase region as a small chunk of the
data in your database. The more regionservers you have, the more data
you can reliably serve. In my cluster, the regionservers are the same
nodes as all of my datanodes, and all of my tasktrackers. So,
essentially, the “regionservers” file should be identical to your
“slaves” file from the hadoop tutorial.Next, modify the hbase-site.xml file. The settings in this file over-write those in hbase-default.xml, so if you want to see a list of available settings to configure, then study that file, but only make changes to your hbase-site.xml. Add the following settings to hbase-site.xml:
hbase.rootdir
hdfs://$master$/hbase
hbase.cluster.distributed
true
hbase.zookeeper.quorum
$slave1$,$slave2$,$slave3$
hbase.zookeeper.property.dataDir
/hadoop/zookeeper/data
Please remember to replace $master$ and $slaveX$ with your master and
slave host names respectively. You may have read that Hbase 0.20 now
requires zookeeper, but fear not, the above configuration directives
allow hbase to completely manage zookeper on it’s own, you never have to
mess with it. Now, it is typically recommended to always run zookeeper
on dedicated zookeeper only servers. If you are running a small cluster,
then this is hardly efficient, because you want as many nodes “working”
as possible. While I can’t give you recommendations of the maximum
cluster size you can have before requiring dedicated zk nodes, I can
tell you that my 6 slave nodes run datanode, tasktracker, regionserver,
and zookeeper without too much of a problem. I would imagine that if you
have over 10 nodes in your cluster, then you shouldn’t have a problem
dedicating a few for zookeeper. They also recommend (maybe even require)
that zookeeper runs on an odd number of machines. I don’t completely
understand how zookeeper works, but basically as long as you still have
more than half of your “quorum” in tact, then your cluster won’t fail.
In essence, if your zk quorum has 7 nodes, you can lose 3 nodes without
any adverse affects, a 35 node quorum could theoretically lose 17 nodes,
and still operate. I think basically zookeeper is used to keep track of
the locations of regions, so your quorum will notify any clients, and
fellow regionservers where to find the data they are looking for. If zk
becomes overloaded, then your regionservers can time out and crash, and
potentially lose data if they haven’t flushed to disk yet. So make sure
you have enough horsepower for your application. In my cluster, the
hbase.zookeeper.quorum directive is simply a comma separated list of all
of my slave nodes, including my master. If you have an odd number of
slaves (even number counting your master), then just leave the master
out of the list. If you have more than ten slaves, then consider
dedicating 3 of them to zookeeper if you have problems with
regionservers timing out. The logs will tell you if that is the case.mkdir -p /hadoop/zookeeper/data && echo 'X' > /hadoop/zookeeper/data/myid
It is imperative that you replace the ‘X’ with ’0′, on the first node in your quorum, ’1′ on the second, ’2′ on the third and so on. This file allows the node to identify itself in the zk quorum.
Once all that per node work is done, you can finally start your hbase instance. From the master /hadoop/hbase directory run:
bin/start-hbase.sh
No comments:
Post a Comment