Tuesday 15 January 2013

Thrift installation

In this tutorial, I am going to explain how to use python and thrift to access HBase. Here is the summary of steps you will need to follow:
1) Download thrift
2) Install thrift dependencies
3) Compile and install thrift
4) Generate HBase thrift python module
5) Add HBase thrift python module to pythonpath
6) Start HBase thrift server
7) Use the client!
Following is the detailed explanation of the steps. I am assuming that you will be using ubuntu as your development environment. That’s what I use. I am also assuming that HBase is installed and you have HBASE_HOME defined in the environment.
1) Download thrift
Download thrift by clicking on the link embedded in this sentence.
Unzip the tar.gz file using tar -xvzf  thrift-0.3.0.tar.gz. Let’s say you unzipped it in /home/horcrux/Software/thrift-0.3.0/
2) Install thrift dependencies
Thrift requires many packages for compilation. It requires boost c++ libraries, flex, mkmf and other build essentials. You can install all the dependencies by executing the following commands. ruby1.8-dev is to get mkmf installed.
sudo apt-get install build-essential
sudo apt-get install libboost1.40-dev
sudo apt-get install flex
sudo apt-get install ruby1.8-dev

3) Compile and install thrift
Execute the following commands to compile and install thrift
cd /home/horcrux/Software/thrift-0.3.0/
./configure
make
sudo make install

Now let’s install thrift python. The following command will make sure that the thrift module is in your pythonpath.
cd /home/horcrux/Software/thrift-0.3.0/lib/py
sudo python setup.py install

4) Generate HBase thrift python module
Once this is done, you should have thrift in your path. You should be able to execute thrift command from anywhere. Now let’s generate the Hbase thrift modeule from the Hbase.thrift config file.
thrift --gen py $HBASE_HOME/src/java/org/apache/hadoop/hbase/thrift/Hbase.thrift
This command will create gen-py folder in your thrift folder (/home/horcrux/Software/thrift-0.3.0).
5) Add HBase thrift python module to pythonpath
We need to add gen-py folder to python path. You can do so by multiple ways
a) You can add it directly at the top of your python file
import sys
sys.path.append('/home/horcrux/Software/thrift-0.3.0/gen-py')

or
b) If you are using an IDE like pydev, add it as a pythonpath source folder.
or
c) add it to pythonpath environemnt variable in your .bashrc.
export PYTHONPATH=$PYTHONPATH:/home/horcrux/Software/thrift-0.3.0/gen-py
6) Start HBase thrift server
You can simply start the thrift server by executing the following command:
$HBASE_HOME/bin/hbase thrift start
This will start HBase thrift server on port 9090 (default port).
7) Use the client!
Here is a sample code that will print all the table names on your HBase server:
from thrift.transport.TSocket import TSocket
from thrift.transport.TTransport import TBufferedTransport
from thrift.protocol import TBinaryProtocol
from hbase import Hbase


transport = TBufferedTransport(TSocket('localhost', 9090))
transport.open()
protocol = TBinaryProtocol.TBinaryProtocol(transport)
client = Hbase.Client(protocol)
print(client.getTableNames())

That’s it.

No comments:

Post a Comment