Wednesday, July 4, 2012

In Memory Data Grid Technologies



What is an In Memory Data Grid?
It is not an in-memory relational database, a NOSQL database or a relational database

The data model is distributed across many servers in a single location or across multiple locations.  This distribution is known as a data fabric.  This distributed model is known as a ‘shared nothing’ architecture.
  • All servers can be active in each site.
  • All data is stored in the RAM of the servers.
  • Servers can be added or removed non-disruptively, to increase the amount of RAM available.
  • The data model is non-relational and is object-based. 
  • Distributed applications written on the Java application platforms are supported.
  • The data fabric is resilient, allowing non-disruptive automated detection and recovery of a single server or multiple servers.
There are also hardware appliances that exhibit all these characteristics.  I use the term in-memory data grid appliance to describe this group of products and these were excluded from my research.
There are six products in the market that I would consider for a proof of concept, or as a starting point for a product selection and evaluation:
  • VMware Gemfire                                                
  • Oracle Coherence                                            
  • Gigaspaces XAP Elastic Caching Edition           
  • Hazelcast                                                          
  • IBM eXtreme Scale
  • Terracotta Enterprise Suite
  • Jboss (Redhat) Infinispan
Why would I want an In Memory Data Grid?
Let’s compare this with our old friend the traditional relational database:
  • Performance – using RAM is faster than using disk.  No need to try and predict what data will be used next.  It’s already in memory to use.
  • Data Structure – using a key/value store allows greater flexibility for the application developer.  The data model and application code are inextricably linked.  More so than a relational structure.
  • Operations – Scalability and resiliency are easy to provide and maintain.  Software / hardware upgrades can be performed non-disruptively.
How does an In Memory Data Grid map to real business benefits?
  • Competitive Advantage – businesses will make better decisions faster.
  • Safety – businesses can improve the quality of their decision-making.
  • Productivity – improved business process efficiency reduces waster and likely to improve profitability.
  • Improved Customer Experience – provides the basis for a faster, reliable web service which is a strong differentiator in the online business sector.
How do use an In Memory Data Grid?
  1. Simply install your servers in a single site or across multiple sites.  Each group of servers within a site is referred to as a cluster.
  2. Install the IMDG software on all the servers and choose the appropriate topology for the product.  For multi-site operations I always recommend a partitioned and replicated cache.
  3. Setup your APIs, or GUI interfaces to allow replicated between the various servers.
  4. Develop your data model and the business logic around the model.
With a partitioned and replicated cache, you simply partition the cache on the servers that best suits the business needs to trying to fulfil, and the replicated part ensures there are sufficient copies across all the servers.  This means that if a server dies, there is no effect on the business service.  Providing you have provisioned enough capacity of course.

The key here is to design a topology that mitigates all business risk, so that if a server or a site is inoperable, the service keeps running seamlessly in the background.

There are also some tough decisions you may need to make regarding data consistency vs performance.  You can trade the performance to improve data consistency and vice versa.

2 comments: