Monday 4 June 2012

Introduction to Cassandra Columns, Super Columns and Rows

This article provides new users the basics they need to understand Cassandra’s “column / super column / row” data model.
Though the focus is not on mechanics, this article assumes you are familiar with adding columns to and requesting data from existing keyspaces on Cassandra.
Remember that a Cassandra column is basically a “name=value” pair* (e.g., “color=red”).  You can use multiple columns to represent data such as

1
2
"Price" : "29.99",
"Section" : "Action Figures"

JSON representation is
1
2
3
4
5
6
7
8
9
10
11
12
13
14
{
  "Transformer" : {
    "Price" : "29.99",
    "Section" : "Action Figures"
  }
  "GumDrop" : {
    "Price" : "0.25",
    "Section" : "Candy"
  }
  "MatchboxCar" : {
    "Price" : "1.49",
    "Section" : "Vehicles"
  }
}
The keys used to group related columns into rows in this example were “Transformer”, “GumDrop” and “MatchboxCar”.


In JSON, this keystore->column family->row->column data structure would be represented like this:
?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
{
  "ToyStore" : {
    "Toys" : {
      "GumDrop" : {
        "Price" : "0.25",
        "Section" : "Candy"
      }
      "Transformer" : {
        "Price" : "29.99",
        "Section" : "Action Figures"
      }
      "MatchboxCar" : {
        "Price" : "1.49",
        "Section" : "Vehicles"
      }
    }
  },
  "Keyspace1" : null,
  "system" : null
}
If you simply wanted to add other types of unrelated collections of information (e.g., “BugCollection” or “PaintColors”), you’d simply keep adding new keyspaces for each new collection.  However, if you needed to keep track of similar collections of data (e.g., your Ohio and New York toy stores instead of a single toy store) you’d need to turn to a different kind of Cassandra element: the “super column”.
To see super columns in action, inspect this keystore->column family->row->super column->column data structure as it appears in JSON:
?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
{
  "ToyCorporation" : {
    "ToyStores" : {
      "Ohio Store" : {
        "Transformer" : {
          "Price" : "29.99",
          "Section" : "Action Figures"
        }
        "GumDrop" : {
          "Price" : "0.25",
          "Section" : "Candy"
        }
        "MatchboxCar" : {
          "Price" : "1.49",
          "Section" : "Vehicles"
        }
      }
      "New York Store" : {
        "JawBreaker" : {
          "Price" : "4.25",
          "Section" : "Candy"
        }
        "MatchboxCar" : {
          "Price" : "8.79",
          "Section" : "Vehicles"
        }
      }
    }
  }
}
This data could also be visualized like this:

Given its late appearance, you might expect that “Ohio Store” and “New York Store” would represent super columns that span multiple rows.   However, the opposite is true:  “Ohio Store” and “New York Store” are now the row keys and entries like “Transformer”, “GumDrop” and “MatchboxCar” have become super columns keys.
Like column keys, super column keys are indexed and sorted by a specific type (e.g., “UTF8Type”, ”AsciiType”, “LongType”, “BytesType”, etc.).    However, like row keys, super column entries have no values of their own; they are simply used to collect other columns.
Notice that the keys of the two groups of super columns do not match.  ({“Transformer”, “GumDrop”, “MatchboxCar”} does not match {“JawBreaker”, “MatchboxCar”}. )  This is not an error: super column keys in different rows do not have to match and often will not.

No comments:

Post a Comment