Monday 7 May 2012

Introduction to Infobright


 Infobright at its core is a highly compressed column-oriented data store,
which means that instead of the data being stored row by row, it is stored
column by column. There are many advantages to column-orientation,
including the ability to do more efficient data compression because each
column stores a single data type (as opposed to rows that typically contain
several data types), and allows compression to be optimized for each
particular data type, significantly reducing disk I/O.

Infobright organizes the data into 3 layers:

• Data Packs
The data itself within the columns is stored in 65,536 item groupings
called Data Packs. The use of Data Packs improves data compression
since they are smaller subsets of the column data (hence less variability)
and the compression algorithm can be applied based on data type.

• Data Pack Nodes (DPNs)
Data Pack Nodes contain a set of statistics about the data that is stored
and compressed in each of the Data Packs. There is always a 1 to 1
relationship between Data Packs and DPNs. DPN’s always exist, so
Infobright has some information about all the data in the database, unlike
traditional databases where indexes are created for only a subset of
columns.

• Knowledge Nodes
These are a further set of metadata related to Data Packs or column
relationships. They can be more introspective on the data, describing
ranges of value occurrences, or can be extrospective, describing how they
relate to other data in the database. Most KN’s are created at load time,
but others are created in response to queries in order to optimize
performance. This is a dynamic process, so certain Knowledge Nodes may
or may not exist at a particular point in time.

The DPNs and KNs form the Knowledge Grid. Unlike traditional database
indexes, they are not manually created, and require no ongoing care and
feeding. Instead, they are created and managed automatically by the
system. In essence, they create a high level view of the entire content of
the database.

The Optimizer is the highest level of intelligence in the architecture. It uses
the Knowledge Grid to determine the minimum set of Data Packs, which
need to be decompressed in order to satisfy a given query in the fastest
possible time.





No comments:

Post a Comment