Tuesday, August 14, 2012

Ways to perform your IT production support work better

 This post is all about sharing a few tips and work principles I have applied in multiple production support teams in onshore & offshore.

 This article will provide you ways to improve your production support skills which may help you better enjoy your IT support job and ultimately become a production support guru.

Everybody involved in production support know this job can be difficult; 7/24 pager support, multiple incidents and bug fixes to deal with on a regular basis, pressure from the client and the management team to resolve production problems as fast as possible and prevent re occurrences. On top of your day to day work, you also have to take care of multiple application deployments, install support  driven by multiple IT delivery teams.

Production support : Main actors

Actor 1: IT Client

- Primary focus on customers and business growth
- Owner of the IT platform
- Maintain a work force of business,Delivery, QA and support team
  
Actor 2: IT Support team


- Resposible of the IT client PROD environment.
- IT delivery support
- Resposible for production support issues and SLA.

Actor 3 : IT Delivery Team

- Responsible to project delivery and implementation as per the requirement.
- Post implementation support.


1#  Proper balance and professionalism between these 3 actors is key for any successful IT production environment.


My first recommendation is regardless how good you are from a technical perspective, you will be unable to succeed as a great production support leader if you fail to proper balance and professionalism between other two actors IT Client and IT delivery team..

You have to realize that you are providing a service to your client who is the owner and master of the IT production environment. You are expected to ensure the availability of the critical  production systems and address known and future problems to come. Stay away from damaging attitudes such as a false impression that you are the actual owner or getting frustrated at your client for lack of understanding of a problem etc. Your job is to get all the facts right and provide good recommendations to your clients so they can make the right decisions. Over time, a solid trust will be established between you and other two actors.

Building a strong relationship with the IT delivery team is also very important. The delivery team, which includes IT architects, project managers and technical resources, is seen as the team of experts responsible to build and enhance the production environments via their established project delivery model.

As a production support individual, you have to build your credibility and stay away from negative and non-professional attitude. Building credibility means hard work, proper gathering of facts, technical & root cause analysis, showing interest in learning a new solution etc. Ultimately, you will be able to work and provide consultation for both teams. 

2#  Each production support issues is learning opportunities

One of the great things about production support is the multiple learning opportunities you are exposed to. You may have realized that after each production outage you achieved at least one the following goals:

  • You gained new technical knowledge from a new problem type
  • You increased your knowledge and experience on a known situation
  • You increased your visibility and trust with your operation client
  • You were able to share your existing knowledge with other team members allowing them to succeed and resolve the problem
Recurring problems, incidents or preventive work still offer you opportunities to gather more technical facts, pinpoint the root cause or come up with recommendations to develop a permanent resolution.

The bottom line is that the more incidents you are involved with, the better. It is OK if you are not comfortable yet to take an active role in the incident recovery but please ensure that you are present so you can at least gain experience and knowledge from your other more experienced team members.

3#  Fear factor around production platform changes such as project deployment, infrastructure or network level changes 

 One common problem I have noticed across the support teams is a fear factor around production platform changes such as project deployment, infrastructure or network level changes etc. Below are a few reasons of this common fear:

  • For many support team members, application “change” is synonym of production “instability”
  • Lack of understanding of the project itself or scope of changes will automatically translate as fear
  • Low comfort level of executing the requested application or middleware changes
Such fear factor is often a symptom of gaps in the current release management process between the 3 main actors or production platform problems such as:
  • Lack of proper knowledge transfer between the IT delivery and support teams
  • Already unstable production environment prior to new project deployment
  • Lack of deep technical knowledge of Java or middleware
  4 #  Improve your coding skills.

My next recommendation is to improve your coding skills. One of the most important responsibilities as part of a production support team, on top of regular bug fixes, is to act as a “gate keeper” e.g. last line of defense before the implementation of a project. This risk assessment exercise involves not only project review, test results, performance test report etc. but also code walkthroughs. Unfortunately, this review is often not performed properly, if done at all. The goal of the exercise is to identify areas for improvement and detect potential harmful code defects for the production environment such as thread safe problems, lack of IO/Socket related timeouts etc. Your capability to perform such code assessment depends of your coding skills and overall knowledge of the  patterns & anti-patterns.

5#  Don’t pretend that you know everything

Another common problem I noticed for many  production support individuals is a skill 'plateau'. This is especially problematic when working on static IT production environments with few changes and hardening improvements. In this context, you get used very quickly to your day to day work, technology used and known problems. You then become very comfortable with your tasks with a false impression of seniority. Then one day, your IT organization is faced with a re-org or you have to work for a new client. At this point you are shocked and struggling to overcome the new challenges.
  • You failed to invest time into yourself and outside your work IT bubble
  • You failed to acknowledge your lack of deeper technology knowledge e.g. false impression of knowing everything
  • You failed to keep your eyes opened and explore the rest of the IT world and Java community
6#  Work towards production support SLA 

Another common problem I noticed for many  production support individuals are not working towards production support SLA. It is also very important in production support . This will improve production support team metrics report and client confidence.

7#  Provide workaround and update to client as soon as possible

You have to acknowledge and  provide a workaround solution if possible to application user as soon as possible. This will create a confidence on you. Once you completed with your analysis then communicate the same to user. While replaying to production issue tries to send an email in layman language.






Wednesday, July 4, 2012

In Memory Data Grid Technologies



What is an In Memory Data Grid?
It is not an in-memory relational database, a NOSQL database or a relational database

The data model is distributed across many servers in a single location or across multiple locations.  This distribution is known as a data fabric.  This distributed model is known as a ‘shared nothing’ architecture.
  • All servers can be active in each site.
  • All data is stored in the RAM of the servers.
  • Servers can be added or removed non-disruptively, to increase the amount of RAM available.
  • The data model is non-relational and is object-based. 
  • Distributed applications written on the Java application platforms are supported.
  • The data fabric is resilient, allowing non-disruptive automated detection and recovery of a single server or multiple servers.
There are also hardware appliances that exhibit all these characteristics.  I use the term in-memory data grid appliance to describe this group of products and these were excluded from my research.
There are six products in the market that I would consider for a proof of concept, or as a starting point for a product selection and evaluation:
  • VMware Gemfire                                                
  • Oracle Coherence                                            
  • Gigaspaces XAP Elastic Caching Edition           
  • Hazelcast                                                          
  • IBM eXtreme Scale
  • Terracotta Enterprise Suite
  • Jboss (Redhat) Infinispan
Why would I want an In Memory Data Grid?
Let’s compare this with our old friend the traditional relational database:
  • Performance – using RAM is faster than using disk.  No need to try and predict what data will be used next.  It’s already in memory to use.
  • Data Structure – using a key/value store allows greater flexibility for the application developer.  The data model and application code are inextricably linked.  More so than a relational structure.
  • Operations – Scalability and resiliency are easy to provide and maintain.  Software / hardware upgrades can be performed non-disruptively.
How does an In Memory Data Grid map to real business benefits?
  • Competitive Advantage – businesses will make better decisions faster.
  • Safety – businesses can improve the quality of their decision-making.
  • Productivity – improved business process efficiency reduces waster and likely to improve profitability.
  • Improved Customer Experience – provides the basis for a faster, reliable web service which is a strong differentiator in the online business sector.
How do use an In Memory Data Grid?
  1. Simply install your servers in a single site or across multiple sites.  Each group of servers within a site is referred to as a cluster.
  2. Install the IMDG software on all the servers and choose the appropriate topology for the product.  For multi-site operations I always recommend a partitioned and replicated cache.
  3. Setup your APIs, or GUI interfaces to allow replicated between the various servers.
  4. Develop your data model and the business logic around the model.
With a partitioned and replicated cache, you simply partition the cache on the servers that best suits the business needs to trying to fulfil, and the replicated part ensures there are sufficient copies across all the servers.  This means that if a server dies, there is no effect on the business service.  Providing you have provisioned enough capacity of course.

The key here is to design a topology that mitigates all business risk, so that if a server or a site is inoperable, the service keeps running seamlessly in the background.

There are also some tough decisions you may need to make regarding data consistency vs performance.  You can trade the performance to improve data consistency and vice versa.