Inability to measure SLAs around application performance
The 2008 Aberdeen Report on Application Performance Management listed the top challenges for Application Performance Management. The Inablity to measure SLAs around application performance is in the top 5 list.
Traditional SLA enforcement
You will find different solutions out in the market that offer monitoring your service infrastructure. Following is a list of sample measures that are captured on the individual components:
- Application Servers
- CPU, Memory, I/O, Network
- Disk space
- Application specific counters
- Web Servers
- Response Times
- Active Sessions
- Transaction volumne and execution times
- Table spaces
- Network Components
- Bytes/packages sent/received
Additionally to monitoring these performance and system indicators certain solutions also offer to actively monitor the system by simulating end user transactions from different POPs (Points of Presence) to measure end-user response times.
All these measures can be used to enforce certain types of SLA’s like Response Times or general System Availability and Responsivness.
Not only does this allow us to get notified when our system is not reacting according to our contracts. Having all this additional information on hand gives us the chance to do some high level problem analysis. Slow response times might be caused by maxed out CPUs on the Application Servers, deadlocks on the database or simply by a problem with a network component
Todays Business requires new type of SLA enforcement
Defining SLAs on Response Times of certain web pages or general availability may work for traditional applications. With the rise of SaaS (Software as a Service) and applications running in virtual environments the traditional approach is no longer applicable.
SaaS companies need to enforce different SLAs for different tenants of their software depending on the contracts and the service level these two service partners agreed on. From a traditional monitoring perspective the response time on the web server is not easily distinguishable by the actual end-user that made the request. Current monitoring solutions may have the ability to break up the Web Request URL into individual parts and pick out the tenant our user id. This would at least theoretically allow assigning a response time to a customer specific SLA.
In modern applications – the tenant or end-user context is not passed as part of the URL or HTTP Body. Once the user has been signed on the user context is transparently kept in the session context. In order to assign each individual transaction to an individual end-user and therefore to a tenant it is necessary to capture additional context information. This additional context information is only accessible within the application and can therefore not be gathered with the traditional approach of monitoring performance counters.
The question therefore is: How can we capture additional context information that allows us to do 21st century SLA enforcement?
21st century SLA’s: How to make it work for todays applications?
In my previous blog entry – User based service level enforcement for Web Applications – I addressed this question by showing how dynaTrace can analyze individual web requests (or any type of request/transaction) by having a closer look at the context information that is gathered along the PurePath (execution path) – providing the ability to enforce SLA’s on a much more granular level than traditional monitoring solutions.
As an alternative approach it would be possible to enrich the services or web applications with application and user specific measures. Exposing custom JMX or Performance Counters via code is a doable task – even though it requires manual code changes. Having performance values available on a granular level like explained in my previous blog allows you to enforce SLA’s in a better way as you can it now.
In a world that moves more toward Service Oriented Architecture (SOA) and Software as a Service (SaaS) it becomes more important to measure the service performance for individual service consumers. Not meeting SLAs or not reacting soon enough to performance degradation implies penalty payments, dissatisfied users and bad reputation.