Monday, 4 June 2012

What’s the Difference Between a a SuperColumn and a SubColumn in Cassandra?

First, remember that in Cassandra terminology, “subcolumn” = “supercolumn” = “sub column” = “supercolumn”.
With that in mind, a “super column family” is really just a “column family…that contains super columns under its rows”.  (As opposed to a regular “column family” that merely contains rows without supercolumns.)


The confusion comes about because “super column family” entries look like this:

1
2
3
4
<ColumnFamily Name="Super1"
              ColumnType="Super"
              CompareWith="BytesType"
              CompareSubcolumnsWith="BytesType" />
..and plain old “column family” entries look like this:

1
2
<ColumnFamily Name="Regular1"
              CompareWith="BytesType" />

…both use a tag named “ColumnFamily” in Cassandra’s “storage-conf.xml” definition file.
Personally, I prefer using the term “Column Family” to cover both column families with rows that contain supercolumns as well as column families with rows that don’t contain supercolumns.  But if someone uses the term “super column family” they always mean “a column family that contains rows that contain supercolumns.”

This article covers the difference between a supercolumn and a subcolumn in Cassandra.
Let me cut to the chase: there is no difference.  They are two terms for exactly the same thing.
If you are familiar with a typical keystore->column family->row->super column->column structure, such as the one pictured below, then you could safely replace all instances of the phrase “super column” with “subcolumn” without changing the meaning.

The confusion around “super column” vs. “sub column” is fueled largely by the Cassandra configuration file.  In your “storage-conf.xml” file you will see XML “ColumnFamily” configuration elements like this:

1
2
3
4
<ColumnFamily Name="Super1"
              ColumnType="Super"
              CompareWith="BytesType"
              CompareSubcolumnsWith="BytesType" />
If this was was a plain old “ColumnFamily” entry, you would only see this:

1
2
<ColumnFamily Name="Regular1"
              CompareWith="BytesType" />
…but this is a “Super Column Family”, so there are two extra attributes:
  • ColumnType=”Super” to tell Cassandra that this column family will contain super columns.
  • CompareSubcolumnsWith=”BytesType” to tell Cassandra that our sub columns will be sorted through bit-by-bit comparison.
Confused?  If so, go back and read the last two bullets again while telling yourself:
“super column = sub column = supercolumn = subcolumn…”

No comments:

Post a Comment