Column Oriented Databases

Continuing our coverage of the disadvantages of Relational Databases, RDBMS design is not read optimized enough for high performance applications or OLAP kind of applications, where aggregates are computed over large numbers of similar data items.

This is where column oriented DBMS can help.

The biggest bottleneck with a DB query in a reporting scenario is its Disk Read Time. Using Column Oriented DBMS will attack this problem by reducing the disk read times drastically in most scenarios. Allow me to explain...

Traditionally Relational Database design has been based on rows. We developers are so used to it and hence can visualize it without effort. Records of an employee in a typical row based database is as shown below:

101 Aravind 27
102 Mike 25

This table will be stored in a disk as 101;Aravind;27&&102;Mike;25. A column oriented implementation of the same table would be persisted as 101;102&&Aravind;Mike&&27;25.

When the query is to find the average age from the table, much fewer disk reads are needed to get the ages of all the employees from a column oriented implementation as all the ages are stored almost sequentially.

While the RDBMS favors queries which require fetching all data of a given row, the Column Oriented DB implementation favors queries which require aggregates of a specific column. Examples include a count of all users of age less than 30,

The Wikipedia page has a good list of column Oriented DB implementations.

Bookmark/Share this post with:
Bookmark and Share


Post a Comment