What is an In Memory Data Grid?
It is not an in-memory relational database, a NOSQL database or a relational database
The data model is distributed across many servers in a single location or across multiple locations. This distribution is known as a data fabric. This distributed model is known as a ‘shared nothing’ architecture.
- All servers can be active in each site.
- All data is stored in the RAM of the servers.
- Servers can be added or removed non-disruptively, to increase the amount of RAM available.
- The data model is non-relational and is object-based.
- Distributed applications written on the Java application platforms are supported.
- The data fabric is resilient, allowing non-disruptive automated detection and recovery of a single server or multiple servers.
There are six products in the market that I would consider for a proof of concept, or as a starting point for a product selection and evaluation:
- VMware Gemfire
- Oracle Coherence
- Gigaspaces XAP Elastic Caching Edition
- Hazelcast
- IBM eXtreme Scale
- Terracotta Enterprise Suite
- Jboss (Redhat) Infinispan
Let’s compare this with our old friend the traditional relational database:
- Performance – using RAM is faster than using disk. No need to try and predict what data will be used next. It’s already in memory to use.
- Data Structure – using a key/value store allows greater flexibility for the application developer. The data model and application code are inextricably linked. More so than a relational structure.
- Operations – Scalability and resiliency are easy to provide and maintain. Software / hardware upgrades can be performed non-disruptively.
- Competitive Advantage – businesses will make better decisions faster.
- Safety – businesses can improve the quality of their decision-making.
- Productivity – improved business process efficiency reduces waster and likely to improve profitability.
- Improved Customer Experience – provides the basis for a faster, reliable web service which is a strong differentiator in the online business sector.
- Simply install your servers in a single site or across multiple sites. Each group of servers within a site is referred to as a cluster.
- Install the IMDG software on all the servers and choose the appropriate topology for the product. For multi-site operations I always recommend a partitioned and replicated cache.
- Setup your APIs, or GUI interfaces to allow replicated between the various servers.
- Develop your data model and the business logic around the model.
The key here is to design a topology that mitigates all business risk, so that if a server or a site is inoperable, the service keeps running seamlessly in the background.
There are also some tough decisions you may need to make regarding data consistency vs performance. You can trade the performance to improve data consistency and vice versa.