Monday 4 June 2012

How to Add and Retrieve Data from a Cassandra Database

This article describes how to create a new keyspace on a Cassandra database server, how to add data to that keyspace and how to run some simple queries against that data.


Create a New Keyspace on Cassandra
First, sign on to your Cassandra server using the “cassandra-cli” client.  Use the “show keyspaces” command to ensure you have a live connection to the server and to make sure the keyspace you are about to add doesn’t already exist.
cassandra> show keyspaces
Keyspace1
system
These two keyspaces are automatically installed when you installed Cassandra and are completely independent of one another – like separate databases on a relational database system would be.  A diagram of two separate keyspaces in our Cassandra database would look like this:

We want to add a new keyspace called “ToyStore”.  Once we’re done, we’d expect our diagram to look like this:

To create an new, empty keyspace called “ToyStore” Type
 CREATE KEYSPACE ToyStore with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'and strategy_options = [{replication_factor:1}];

Use ToyStore ;

 This will create a “column family”  (if you’re a relational database user think “table” for now) called “Toys”.
The “Comparator” attribute in the “ColumnFamily” tag you just added controls how row keys are indexed and sorted.  The “UTF8Type” value key indicates that you’re indexing by UTF8 characters.  Other possible values include”AsciiType”, “LongType” (64-bit long integers) and “BytesType” (straight bit-to-bit comparison – the default value).

CREATE COLUMN FAMILY Toys
WITH comparator = UTF8Type
AND key_validation_class=UTF8Type
AND column_metadata = [
{column_name: name, validation_class: UTF8Type}
{column_name: price, validation_class: LongType}
];


Add Data To An Existing Keyspace on Cassandra
Now that we have a new “ToyStore” keyspace it’s time to add some data.  If you were watching closely you’ll notice that we did more than add a keystore in the previous step: we added our first “column family” too.  (Think “table” if you’re coming from a relational database background.)

To get started adding data, restart your Cassandra client and use the following syntax to add six name/value pairs to the “Toys” column family of your new “ToyStore” keyspace.
cassandra> set ToyStore.Toys['Transformer']['Price'] = ’29.99′
Value inserted.
cassandra> set ToyStore.Toys['GumDrop']['Price'] = ’0.25′
Value inserted.
cassandra> set ToyStore.Toys['MatchboxCar']['Price'] = ’1.49′
Value inserted.
cassandra> set ToyStore.Toys['Transformer']['Section'] = ‘Action Figures’
Value inserted.
cassandra> set ToyStore.Toys['GumDrop']['Section'] = ‘Candy’
Value inserted.
cassandra> set ToyStore.Toys['MatchboxCar']['Section'] = ‘Vehicles’
Value inserted.
If you run a “help” command from the Cassandra client you will see the following syntax for the kind of “set” command we just used:
?
1
set <ksp>.<cf>['<key>']['<col>'] = '<value>'
Let’s break this command syntax down using one of the commands we just typed.
?
1
set ToyStore.Toys['Transformer']['Price'] = '29.99'
According to our command syntax, the command we typed meant this:
  • ksp = KeySpace = “ToyStore”
  • cf = Column Family = “Toys”
  • key = Row Key (an indexed key which links multiple columns) = “Transformer”
  • col = Single Column Name (the name in a single name/value pair) = “Price”
  • val = Single Column Value (the value in a single name/value pair) = “29.99″

These six commands created a total of three rows in the “Toys” column family: “Transformer”, “GumDrop” and “MatchboxCar”.  Within each row you created two columns: “Section” and “Price”.   Sketched out in a diagram the data you inserted would look something like this:

Within the “Toys” column family, you could also represent this data in JSON like this:
?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
{
  "Transformer" : {
    "Price" : "29.99",
    "Section" : "Action Figures"
  }
  "GumDrop" : {
    "Price" : "0.25",
    "Section" : "Candy"
  }
  "MatchboxCar" : {
    "Price" : "1.49",
    "Section" : "Vehicles"
  }
}
Starting to make sense? Now, let’s try to pull this data back.

Retrieve Data From An Existing Keyspace on Cassandra
Let’s start by counting the number of name/value pairs (i.e., “columns”) stored under one of the row keys we just inserted.
cassandra> count ToyStore.Toys['GumDrop']
2 columns

If you followed directions, the answer will be “2 columns”, whether you use “GumDrop”, “Transformer” or “MatchboxCar” as your column key.
Now try spelling out the row key in all lowercase.

cassandra> count ToyStore.Toys['gumdrop']
0 columns
Yes, Cassandra row keys are case-sensitive. Consider yourself warned, especially if you’re coming from a database environment where cases are insensitive.

Now trying spelling out the row key that doesn’t exist.
cassandra> count ToyStore.Toys['RedMatterBall']
0 columns
Notice that you didn’t get a “no column exists” error on your count statement; instead you were simply told that zero name/value pairs exist for your non-existent row key.
Now that you know to be careful with the exact name and case of your row keys, let’s pull back the data in a particular row instead of just counting how many columns it contains. To do this, use the “get” command as shown below.
cassandra> get ToyStore.Toys['GumDrop']
=> (column=Section, value=Candy, timestamp=1278132493790000)
=> (column=Price, value=0.25, timestamp=1278132306875000)
Returned 2 results.
The two “column” and “value” entries look familiar but there’s a third item in each of our columns: “timestamp”. That value represents the time when you made each column entry. Timestamp may not mean much to us yet (we will safely ignore it for another article or two), but timestamp will mean a great deal to us when we start merging column inserts/updates from two or more Cassandra database nodes.
By the way, here’s how you could represent the timestamp on each column in your diagram:

But back to our data retrieval task. Before we move on, try at least one row key that doesn’t exist.
cassandra> get ToyStore.Toys['RedMatterBall']
Returned 0 results.
Again, note that Cassandra reports that there are “0 results” for this row key, not that this row key doesn’t exist.
The last thing we’re going to do in this article is drill down into an existing row and only pick out one column (i.e., one name/value pair).
cassandra> get ToyStore.Toys['GumDrop']['Price']
=> (column=Price, value=0.25, timestamp=1278132306875000)
Now try this with a valid row key and an invalid column.
cassandra> get ToyStore.Toys['GumDrop']['Taste']
Exception null
This time we got an error rather than a “count of zero” message!
Relational database folks, are you starting to see the pattern? (Hint: Using non-existent row keys is like executing a “SELECT COUNT(*) FROM DB” with a WHERE clause that matches nothing, but using non-existent column names is like executing a query with invalid fields.)

1 comment:

  1. related with Cassandra Database, you can download this article here http://repository.gunadarma.ac.id/bitstream/123456789/2989/1/PERBAN~1.PDF

    ReplyDelete