Monday 4 June 2012

Migrate a Relational Database Structure into a JSON

JSON stands for “JavaScript Object Notation” and is an efficient way to transfer complex information about specific entities between two separate programs.
As the “JavaScript” name implies, JSON is often used to transfer information between JavaScript-interpreting web browsers and JSON-aware web applications.  In fact, native understanding of JSON is now built into most web browsers’ JavaScript interpreters.

The Original Relational Database Structure
We are going to start with a very simple 1:N relational database structure. Our first two tables are “forests” and “famoustrees”.  Here is our data in tabular format:

forests:


famoustrees:



“famoustrees” is linked to “forests” using the “forestID” foreign key.  Notice that there are no famous trees in the “Lonely Grove” forest, one famous tree in the “100 Acre Woods” and two famous trees in the “Black Forest”.
If we were to represent the data in our database – call it our “biologicalfeatures” database – in JSON, it would look like this:
?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
{
  "biologicalfeatures":
    {
    "forests" :
      {
      "forest003" :
        {
          "name" : "Black Forest",
          "trees" : "two million",
          "bushes" : "three million"
        },
      "forest045" :
        {
          "name" : "100 Acre Woods",
          "trees" : "four thousand",
          "bushes" : "five thousand"
        },
      "forest127" :
        {
          "name" : "Lonely Grove",
          "trees" : "none",
          "bushes" : "one hundred"
        }
      },
    "famoustrees" :
      {
      "tree12345" :
        {
          "forestID" : "forest003",
          "name" : "Der Tree",
          "species" : "Red Oak"
        },
      "tree12399" :
        {
          "forestID" : "forest045",
          "name" : "Happy Hunny Tree",
          "species" : "Willow"
        },
      "tree32345" :
        {
          "forestID" : "forest003",
          "name" : "Das Ubertree",
          "species" : "Blue Spruce"
        }
      }
    }
}
Denormalizing the Tables
To collapse the famoustrees table into our forests table, we need to move each famoustree entry underneath its forest entry.  We can also also remove the foreign “forestID” key from each famoustree entry – we don’t need that anymore.

However, we should retain the type of each famoustree entry we moved into the forest entry.  We can do this by adding an extra “type” value to each entry.

Finally, we could break out the original non-ID information in each forest entry into a typed section too.  We’ll tag each of these sections with a new ID of “generalinfo”.  (This is a Cassandra-friendly convention – we’ll get into this more below.)

Represented in JSON, our data now looks like this:
?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
{
  "biologicalfeatures":
    {
    "forests" :
      {
      "forest003" :
        {
        "generalinfo" :
          {
          "name" : "Black Forest",
          "trees" : "two million",
          "bushes" : "three million"
          },
        "tree12345" :
          {
            "type" : "famoustree",
            "name" : "Der Tree",
            "species" : "Red Oak"
          },
        "tree32345" :
          {
            "type" : "famoustree",
            "name" : "Das Ubertree",
            "species" : "Blue Spruce"
          }
        },
      "forest045" :
        {
        "generalinfo" :
          {
          "name" : "100 Acre Woods",
          "trees" : "four thousand",
          "bushes" : "five thousand"
          },
        "tree12399" :
          {
            "type" : "famoustree",
            "name" : "Happy Hunny Tree",
            "species" : "Willow"
          }
        },
      "forest127" :
        {
        "generalinfo" :
          {
          "name" : "Lonely Grove",
          "trees" : "none",
          "bushes" : "one hundred"
          }
        }
      }
    }
}
Ready for Cassandra?
There are really only two types of JSON data structures that can be imported directly into Cassandra.  One is the
keystore->columnfamily->rowkey->column
data structure shown below:
?
1
2
3
4
5
6
7
8
9
10
11
12
{
  "keystore":
    {
    "columnfamily" :
      {
      "rowkey" :
        {
          "column name" : "column value"
        }
      }
    }
}
Add another layer and you get the other supported data structure
keystore->columnfamily (a.k.a. “supercolumnfamily”)->rowkey->supercolumn (a.k.a. “subcolumn”)->column
shown below:
?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
{
  "keystore":
    {
    "columnfamily" :
      {
      "rowkey" :
        {
        "supercolumn" :
          {
          "column name" : "column value"
          }
        }
      }
    }
}
That’s it: if you can get your data to fit into one of those two JSON structures, your data is ready to be input into Cassandra.
You probably suspect that I wouldn’t have taken you this far if our forests data wasn’t ready for Cassandra, but please take a moment to scroll up and see if you can figure out whether our denormalized forests data uses supercolumns or not.
Let’s break it down:
biologicalfeatures -> forests
…matches the keystore->columnfamily structure used by both supported JSON structures.
As for the rest:
forest003 -> generalinfo -> (name=”Black Forest”)
…matches the rowkey->supercolumn->column structure used by the “supercolumn” supported JSON structure.
So, yes, we had to use supercolumns to denormalize the forests and famoustrees tables properly.

1 comment:

  1. related with Cassandra Database, you can download this article here http://repository.gunadarma.ac.id/bitstream/123456789/2989/1/PERBAN~1.PDF

    ReplyDelete