In-Memory Databases - A good alternative to traditional RDBMS?

Regular readers would have noticed, RDBMS has had its fair share of bashing in this blog here, here & here. There have been fair share of readers' interest, criticism & push-back about the concepts discussed already. I would agree that all I have been providing is criticism of RDBMS and not a fair solution to overcome various problems. This blog is about one such solution - In-memory Database.

In an earlier blog post we have dealt with scenarios on how RAM should take the place of the disk as the primary data access layer and In-memory Data Grid as . If we apply the same principle to our regular/traditional RDBMS, which do frequent disk access we get In-Memory Databases!

So why 'In-memory Databases'?
  1. Considerable Performance gains as Disk IO is eliminated
  2. Easier/Built-in High Availability. You can see from this post that traditional DBs based on disks are breakable!
  3. Scalability - We can scale by just adding more nodes. With Traditional RDBMS, we have a cap beyond which adding more nodes will not improve performance, as the writes while getting replicated across systems will eat-up all the resources themselves, and you have no room for queries!
Why NOT 'In-memory Databases' right away?
  1. It is still hard to have the entire database on your memory if the database is huge or in-terms of TBs
  2. Cost of memory is higher than cost of that disks; adding disk space is cheaper.
  3. Additional costs for application development, if existing code doesn't take advantage of In-Memory paradigm.
  4. Learning Curve needed for existing developers to get used to the new paradigm
  5. Relatively new technology hence the associated Perceived risks and/or Reluctance to change.
There are some commercial products in this space which you might want to play around with. A good starting point would be to go over the testing results as brought-out by this blog post.

Happy Programming!


Bookmark/Share this post with:
Bookmark and Share

Read more!

Relational Databases - A Bane !

Picked-up an interesting a note that had cursed RDBMS as early as October 15, 1991.

According to the author of this post Henry G. Baker, Ph.D.
  • relational databases set the commercial data processing industry back at least ten years and wasted many of the billions of dollars that were spent on data processing. And, Computing history will consider the past 20 years as a kind of Dark Ages of commercial data processing.
  • relational databases performed a task that didn't need doing; e.g., these databases were orders of magnitude slower than the "flat files" they replaced, and they could not begin to handle the requirements of real-time transaction systems
  • relational databases made trivial problems obviously trivial, but did nothing to solve the really hard data processing problems
  • Database research has produced a number of good results, but the relational database is not one of them

I guess, we in this blog have cursed the RDBMS enough here & here. But the real question is why hasn't this stream of thought picked-up by the mainstream developers since 1991?

Bookmark/Share this post with:
Bookmark and Share

Read more!

More proof about why to work on Performance

O’Reilly's Alistair Croll says that improving performance of websites not only increases pages per visit, time spent on the site, conversion rate & order value increased but also decrease reduces operating costs & a decrease in outbound traffic!

In a detailed Blog Post he elucidates this in a more visual & easy to understand way. This only increases the sufficient data that we already had validating the need for improving performance in the applications that we create.


Bookmark/Share this post with:
Bookmark and Share

Read more!

RAM is the new Disk, Disk is the new Tape

We all know the trend. The demands on web applications are already long and keep growing with time. With the proliferation of "real-time" web applications and systems that require massive scalability, how can hardware and software keep-up?

TechTalk will continue to answer that question in this blog post & the many posts to come.

Memory is several orders of magnitude faster than disk for random access to data. The networks in data-centers getting faster, it’s not only cheaper to access memory than disk, it’s cheaper to access another computer’s memory through the network.

With disk speeds growing very slowly and memory chip capacities growing exponentially, in-memory software architectures offer the prospect of orders-of-magnitude improvements in the performance of all kinds of data-intensive applications.

For random access, disks are irritatingly slow; but if you pretend that a disk is a tape drive, it can soak up sequential data at an astounding rate; it’s a natural for logging and journaling a primarily-in-RAM application.

If you are not trying memchached yet, I strongly believe you should give it a shot.

On a parallel note, Nati Shalom in a March 2008 post gives a detailed discussion about this convergence of RAM & Disk play into usage & deployment of MySQL. To summarize the problem in his own words "The fundamental problems with both database replication and database partitioning are the reliance on the performance of the file system/disk and the complexity involved in setting up database clusters."

His solution was to go with an In-Memory Data Grid (IMDG), backed by technologies like Hibernate 2nd level cache or GigaSpaces Spring DAO, to provide Persistence as a Service for your applications. Shalom explained IMDGs saying "they provide object-based database capabilities in memory, and support core database functionality, such as advanced indexing and querying, transactional semantics and locking. IMDGs also abstract data topology from application code. With this approach, the database is not completely eliminated, but put it in the *right* place."

The primary benefits of an IMDG over direct RDBMS interaction listed were:
  • relies on memory which is significantly faster and more concurrent than file systems
  • Data can be accessed by reference
  • Data manipulation is performed directly on the in-memory objects
  • Reduced contention for data elements
  • Parallel aggregated queries
  • In-process local cache
  • Avoid Object-Relational Mapping (ORM)
But as argued here, whether or not you need to change the way you think about your applications and hardware ultimately depends on what you are trying to do with them. Atleast I believe the times are changing and there is a dire need to rethink the way we approach programming for Performance & Scalability.

Bookmark/Share this post with:
Bookmark and Share

Read more!

Who would need Scalability?

Lets face it; Scalability is sexy. But it is not for everyone as it is difficult and cannot be justified without a proper return on the investment.

Unless you know what you need to scale to, you can't even begin to talk about scalability. How many users do you want your system to handle? A thousand? Hundred thousand? Ten million? Here's a hint: the system you design to handle a quarter million users is going to be different from the system you design to handle ten million users.

If You Haven't Discussed Capacity Planning, You Can't Discuss Scalability.

You don't need to worry about scalability on your application because nobody is going to use it. Really. Believe me. You're going to get, at most, 1,000 people on your app, and maybe 1% of them will be 7-day active. Scalability is not your problem, getting people to give a shit is.

Shut up about scalability, if no one is using your app anyway!

Post Inspired by:
I'm Going To Scale My Foot Up Your Ass

Bookmark/Share this post with:
Bookmark and Share

Read more!