About the Author

Andreas Grabner has been helping companies improve their application performance for 15+ years. He is a regular contributor within Web Performance and DevOps communities and a prolific speaker at user groups and conferences around the world. Reach him at @grabnerandi

The Hidden Class Loading Performance Impact of the Spring Framework

The Spring Framework is great as it removes a lot of legwork that developers would otherwise need to do in order to get a new application up and running. Instead of spending time re-inventing the wheel, it is generally easy and convenient to use frameworks for common tasks such as Caching, Database Access or Data Binding with UI Elements. “Trusting” a framework blindly without looking “underneath the hood” is, however, not a good idea. It’s like blindly trusting a used car dealer without checking on engine, breaks, or tires before taking it on a long road trip.

This story focuses on a large bank that built its core e-Banking application on WebSphere and the popular Open Source Spring Framework. Every beginning of the month the bank faces an unusual high number of online customers checking whether their paycheck has already arrived. For ten consecutive months they saw that most logins between 10AM and NOON failed due to timeouts. That’s frustrating for the banking customers and bad for the image of the bank. The bank decided to use an APM (Application Performance Management) solution to enable it to understand its user experience and why customers are frustrated with their logins.

In this blog we have a look at the symptoms, the analysis process and how the team members fixed the problem in their deployed Spring Framework in such a way that they no longer have any outages but in fact increased the number of online customers by 79% due to improved end user experience.

Observation #1: Running out of Worker Threads

To monitor the health of JVMs the bank has different dashboards that monitor metrics for Memory, Network, CPU, I/O and Thread Metrics. The following dashboard shows the operations team at the bank which JVM currently runs low on available worker threads. The maximum number of threads is 250. These “traffic lights” turn red in case a JVM uses more than 150 – which is an early warning signal for the team – especially in a situation where most of the JVMs show up in RED:

Seeing so many JVMs with a very high usage of worker threads is an early warning signal for application owners

Seeing so many JVMs with a very high usage of worker threads is an early warning signal for application owners

The following chart shows that the threads start being active at around 10 AM in the morning reaching a critical stage at about 10:10AM when most of the JVMs are already maxing out their available worker threads:

Almost all JVMs start showing the same high active worker thread count shortly after 10AM

Almost all JVMs start showing the same high active worker thread count shortly after 10AM

Observation #2: Threads Waiting on Class Loader

What are these threads doing? Looking at these threads and analyzing their state quickly showed that most of these threads (242 out of 250) were waiting on the WebSphere CompoundClassLoader as all of these threads tried to load additional classes. Due to the high number of threads trying to access that shared resource – the Class Loader – most threads gut stuck in waiting:

With increasing load more threads keep waiting for the class loader. This eventually results in all threads waiting for this shared resource.

With increasing load more threads keep waiting for the class loader. This eventually results in all threads waiting for this shared resource.

Observation #3: Most Time Spent in Class Loading Even Under Normal Load

Looking at the response time contribution of the actual web requests executed by these end users shows the same picture. 80-90% of the response time is contributed by Java Class Loading. Interesting fact though is that this doesn’t just happen during peak load – but also during “normal” load:

80-90% of the transaction response time is spent in Java Class Loading (purple portion of the response time contribution). That’s true for peak but also “normal” load.

80-90% of the transaction response time is spent in Java Class Loading (purple portion of the response time contribution). That’s true for peak but also “normal” load.

Observation #4: Class Loader Tries to Load Non-Existing Classes

So – is all this class loading necessary? Looking at the actual transactions being executed shows us that for EVERY web request the app server tries to load a class that does not exist – leading to a ton of ClassNotFoundExceptions. Because this class can never be loaded successfully but the app server keeps trying to load it for every request we have the root cause of the problem. This is true for fast and slow transactions and highlights the importance of seeing this level of detail for every transaction in your system as the fast ones are also holding on to that “scarce” resource “Class Loader” and therefore impacting other transactions.

The following screenshot shows a PurePath for one of these requests on their system. The flexibility and ability of the PurePath Technology to capture every single transaction including the information who wanted to load these classes and why that failed was critical to identify the root cause:

For every request the application server tries to load the class that ends in TransferValidatorBGBeanInfo which does not exist.

For every request the application server tries to load the class that ends in TransferValidatorBGBeanInfo which does not exist.

Root Cause: Newer Version of Spring Framework checks existence of BeanInfo Classes

After some investigation it was discovered that a recent update to a newer version of the popular Spring Framework introduced a new behavior in which the framework always tries to load the BeanInfo class for each Bean. As this check is done for every bean in every request it causes all of these ClassNotFoundExceptions. Further investigation on this also showed that this behavior was already reported to Spring back in January 2012 as part of SPR-9014.

Result: No Outages Anymore and Boost in Logins by 79%

Rolling back to the previous version of Spring not only got rid of the performance impact of heavy class loading. Since the change they had

  • No Outages since that deployment
  • Improved reputation which resulted in 79% more logins to their system
  • Lower operational costs due to saved CPU cycles

Fixing all of these problems easily justified the decision to use Compuware APM. Not only did they solve these problems but other critical ones such as issues in their data access layer causing duplicate key exceptions as well as problems with misconfigured beans resulting in a high number of internal exceptions which impacted overall transaction performance in a similar way as explained in The Performance Impact of Exceptions.

Thanks to Ibrahim Mohammed – Enablement Service Engineer – for discovering this behavior and helping both our customer and our community to be aware of these issues and how to prevent them

Comments

  1. Could you update SPR-9014 with the version numbers of Spring where this happens and where it doesn’t. There is a comment on the bug about a fix going into 3.2 but perhaps it didn’t fully solve things.

  2. Harry Berger says:

    The Spring dev-lead Jürgen Höller has already commented on the issue and points out how to properly enable caching with the new version.

Comments

*


5 − = three