RAM is the new Disk, Disk is the new Tape

We all know the trend. The demands on web applications are already long and keep growing with time. With the proliferation of "real-time" web applications and systems that require massive scalability, how can hardware and software keep-up?

TechTalk will continue to answer that question in this blog post & the many posts to come.

Memory is several orders of magnitude faster than disk for random access to data. The networks in data-centers getting faster, it’s not only cheaper to access memory than disk, it’s cheaper to access another computer’s memory through the network.

With disk speeds growing very slowly and memory chip capacities growing exponentially, in-memory software architectures offer the prospect of orders-of-magnitude improvements in the performance of all kinds of data-intensive applications.

For random access, disks are irritatingly slow; but if you pretend that a disk is a tape drive, it can soak up sequential data at an astounding rate; it’s a natural for logging and journaling a primarily-in-RAM application.

If you are not trying memchached yet, I strongly believe you should give it a shot.

On a parallel note, Nati Shalom in a March 2008 post gives a detailed discussion about this convergence of RAM & Disk play into usage & deployment of MySQL. To summarize the problem in his own words "The fundamental problems with both database replication and database partitioning are the reliance on the performance of the file system/disk and the complexity involved in setting up database clusters."

His solution was to go with an In-Memory Data Grid (IMDG), backed by technologies like Hibernate 2nd level cache or GigaSpaces Spring DAO, to provide Persistence as a Service for your applications. Shalom explained IMDGs saying "they provide object-based database capabilities in memory, and support core database functionality, such as advanced indexing and querying, transactional semantics and locking. IMDGs also abstract data topology from application code. With this approach, the database is not completely eliminated, but put it in the *right* place."

The primary benefits of an IMDG over direct RDBMS interaction listed were:
  • relies on memory which is significantly faster and more concurrent than file systems
  • Data can be accessed by reference
  • Data manipulation is performed directly on the in-memory objects
  • Reduced contention for data elements
  • Parallel aggregated queries
  • In-process local cache
  • Avoid Object-Relational Mapping (ORM)
But as argued here, whether or not you need to change the way you think about your applications and hardware ultimately depends on what you are trying to do with them. Atleast I believe the times are changing and there is a dire need to rethink the way we approach programming for Performance & Scalability.

Bookmark/Share this post with:
Bookmark and Share

Read more!