Monday 28 May 2012

Cassandra V/s HBase Architecture

Both Cassandra and HBase are influenced by Google’s BigTable. (Although Cassandra was directly influenced by Amazon’s Dynamo.) BigTable is a Column-Oriented DB that stores data in a Multidimensional Sorted Map format and has the <row key, column family, column key> data structure. Column family is a basic unit that stores data with column groups that are related to each other.
In BigTable, data is written basically with the append method. In other words, when modifying data, the updates are appended to a file, rather than an in-place update in the stored file. The figure below shows BigTable’s data read/insert path.
BigTable’s Internal Structure
When a write operation is inserted, it is first placed in a memory space called memtable. If the memtable is full, then the whole data is stored in a file called SSTable (Sorted String Table). Data may be lost if the server collapses before the memtable data is moved to SSTable, so to provide durability it is necessary to save the history of commit logs every time before writing to the memtable. When conducting the read operation, first find the pertaining key in the memtable. If it is not in the memtable, search for it in the SSTable. You may have to search multiple SSTables.
There are advantages in the writing operation if this architecture is used. This is because the ‘writing’ operation is only recorded in the memory and moved to the actual disk only after a certain amount has been accumulated. Therefore, concurrent I/O can be avoided. However, when "reading", if "reading" is done in the SSTable and not in the memtable, then the performance will relatively decrease. Both Cassandra and HBase use bloom filter to quickly judge whether the key exists in the SSTable and creates an index for use. However, if there are many SSTables, then a lot of I/O will be created during the reading operation. Therefore, compaction is done to improve the reading performance. Compaction is where two SSTables merge and sort to become one SSTable, which decreases the number of SSTables. The reading and writing performance improved as more compactions are done.
For these reasons READ and READ-and-UPDATE are much slower than INSERT operations in these NoSQL solutions.


Cassandra

Additionally, Cassandra uses consistent hashing. The HBase and Casandra may have their similarities, there are differences as well. Cassandra prefers AP (Availability & Partition tolerance) while HBase prefers CP (Consistency & Partition tolerance). CAP theorem is a theory in Distributed Computing.
The theory claims that there is no system that provides all three Consistency, Availability and Partition tolerance. For example, replications must be made in multiple nodes to increase usability, and the adjustability between replicated data must be met. However, in order to make operation possible, even during network partitioning, it becomes difficult to guarantee the adjustability between replications or data. Therefore, only a part of the CAP can be provided.
As mentioned before, Amazon’s Dynamo has a direct influence on Cassandra. Dynamo uses consistent hashing to disperse data to the key-value store, and provides high adjustability. Consistent hashing sequences the value (slot) that hashed the key and placed it in a ring format. The multiple nodes that create the cluster processes certain ranges of the ring. Therefore, every time a node in the cluster falls out or comes in, the closed node on the ring can take over the concerned range or distribute the range without rehashing.
Cassandra also raises the usability level with the concept of consistency level. This concept is related to replication. Confirmation of the number of replications and the completion of replication can be adjusted with the system parameter. For example, if three replications are maintained and a write operation is inserted, then the operation is only considered to be successful if the three replications are completed.
However, Cassandra allows only N (under 3) number of executions to be checked and immediately returns to value. Therefore, write can be conducted successfully even with failures in the replication node, which raises usability. Histories of failed write operations are recorded on a different node, and the operation can be retried at a later date (this is called “hinted handoff”). Since the success of replicated writing is not guaranteed, the data suitability is checked in the reading stage. Generally, if there are multiple replications, it is collected into one when reading. However, Cassandra keeps in mind that not all replications match, and reads the data from all three replications, checks whether it is identical, and restores the latest data if it is not suitable (this is called “read repair”).
This is why Cassandra is slower in reading than writing.

HBase

Now let’s look into HBase. HBase has the same structure as BigTable. While BigTable operates on the tablet unit, HBase disperses and replicates with the region unit. Key is arranged according to ranges, and the location of the region, where the key is stored in, is saved as meta data (this is called meta region). Therefore, if a key is inserted, first find the region, and later the client will cache this information. HBase places the writing operation in a single region, without distributing and splits the region if it becomes too large. Adjustments, such as creating regions in advance, must be made for the action above.
HDFS is in charge of data storage and replication. When writing a file with Hadoop’s distribution file system, HDFS writes multiple replications (synchronous replication) and only reads one of them when reading. This guarantees data integrity and lowers reading overhead. However, since HBase and HDFS operates separately, the location of the node that contains memtable and SSTable may change, which may cause additional network I/O.
What happens if there is a failure in the node? If the region server collapses, other nodes take over the tasks which comprised the region. In other words, the commit log that pertains to the region is replayed and memtable is created. If the failed node contains HDFS’s data, then chunk that pertains to the node is migrated to a different node.
Thus, we have looked at the similarities and differences of Cassandra and HBase.
Then why did the two products show a difference in writing during the comparison test?
  1. First of all, there is a difference in commit log. Cassandra writes the commit log once in the local file system. However, HBase enter the commit log in the HDFS and HDFS conducts the replication. Data locality is not put into consideration, so the file write request will probably be sent to a different physical node in the network.
  2. Second, Cassandra receives the write request on all three nodes, but HBase only ‘writes’ on a single region in the beginning, and receives requests on only one node.
Then why is there a difference in reading performance?
There are various differences, but for one READ request Cassandra reads the data three times while HBase only reads the data once. This places an influence on the throughput.

4 comments:

  1. " However, Cassandra keeps in mind that not all replications match, and reads the data from all three replications, checks whether it is identical, and restores the latest data if it is not suitable (this is called “read repair”). This is why Cassandra is slower in reading than writing."

    Read repair is an asynchronous process and does not affect read performance.
    reading is also tunable with consistency level like writing. If your consistency level is 1 while reading. Cassandra does not wait for other reads to respond , it only waits for the first one to come and return to client which is why it is eventually consistent.

    ReplyDelete
  2. Great blog.. After reading this blog i am very strong in cassandra and hbase architecture which really helpful to cracking the hadoop interview

    big data training in velachery | hadoop training institute in velachery

    ReplyDelete
  3. After reading this blog i learnt new information about hadoop and i got new idea about hadoop which really helpful to develop my knowledge and cracking the interview easily.. thanks a lot for sharing this blog to us

    best big data training in chennai | best hadoop training

    ReplyDelete
  4. Thank you.Well it was nice post and very helpful information on Big data hadoop online training


    ReplyDelete