Andreas Grabner About the Author

Andreas Grabner has been helping companies improve their application performance for 15+ years. He is a regular contributor within Web Performance and DevOps communities and a prolific speaker at user groups and conferences around the world. Reach him at @grabnerandi

Top Performance Mistakes when moving from Test to Production: Excessive Logging

When things work great in testing it doesn’t necessarily mean that everything works great in production. There are many things that might be different – such as: much higher load than tested, different hardware, external services that couldn’t be tested, “Real Users” that execute transactions that weren’t tested.

This blog marks the start of a series of blog posts covering the Top Performance Mistakes when moving from Test to Production. All examples are taken from real life applications we have been working with. I will start the series with one that many application owners may not think about: Excessive Logging.

Logging is important – but handle with care

We all know that logging is important as it provides valuable information when it comes to troubleshooting. Logging Frameworks make it easy to generate log entries and, with the help of tools, its becoming easier to consume and analyze log files when it is time to troubleshoot a problem. But what if there is too much log information that nobody looks at anyway, that just fills up your file system and – the worst thing – seriously impacts your application performance and with that introduces a new problem to you application? The following two examples are taken from an online web application running on Tomcat. We can easily see how the current logging settings can become a real problem once this application gets deployed in testing.

Example #1: Excessive use of Exception objects to log Stack Trace

In this example the Tomcat Connection Pool was used with the setting “logAbandoned=true”. In case the Connection Pool runs into a problem it starts logging a lot of information including full stack traces of where the application tried to get a connection from the pool. In case of this application a typical transaction executed several hundred database queries. Most queries were executed on separate connections which results in about as many getConnection calls as executedQuery calls.

High Overhead when creating hundreds of exception objects to get full stack trace information

High Overhead when creating hundreds of exception objects to get full stack trace information

Looking just at the Exception objects gives us an even clearer picture on how much information gets collected that won’t provide a whole lot of useful information.

Several hundred exception objects are created just for a single web request

Several hundred exception objects are created just for a single web request

When looking at the exception details of these individual exception objects it is easy for us to see which class actually creates these objects in order to obtain the stack trace information for logging:

AbandonedTrace.setStackTrace is the method that creates these Exception objects

AbandonedTrace.setStackTrace is the method that creates these Exception objects

Advice for Production

In the test environment this might not be a big issue if only simulating a fraction of the load expected on the live production system. It must however be a warning to the one responsible for moving this application to production. More information on this particular problem can be found here: https://groups.google.com/forum/?fromgroups#!topic/mmbase-discuss/5x24EjBZMGA

 

Example #2: Too granular and excessive use of logging

Many applications have very detailed logging in order to make problem analysis easier in development and testing. When deploying an application in production it is often forgotten to turn off DEBUG, FINE, FINEST or INFO logs which would just flood the file system with useless log information and also cause additional runtime overhead. It also often happens that log levels are used incorrectly which leads to too many WARNING or ERROR logs which are not a problem at all. This will make it very hard when analyzing log files to find the “real” problems. The following example shows a log message writing multiple times in the same transaction using a wrong log level.

Excessive Logging and incorrect usage of severity level results in too much time spent analyzing log outputs

Excessive Logging and incorrect usage of severity level results in too much time spent analyzing log outputs

Advice for Production

  • Make sure you configure your log levels correctly when deploying to production.
  • Test these settings in your test environment and verify that there are no DEBUG, FINE, … log messages still logged out to the file system.
  • Show the generated log files to the people that will have to use these log files in case of a production problem and get their agreement that these log entries make sense and don’t contain duplicated or “spam” messages.

Additional Reading

If you are interested in Load Testing also check out the post on To Load Test or Not to Load Test: That is not the Question.

Comments

  1. IMHO you can’t log enough in production. Lest it slows things down, it should be asynchronous.
    In a layered application there should at least one log entry for the incoming request, invocation of the business layer and all invocations of the data access layer. Ideally, this hould give enough information to pinpoint a bug without having to replay the request in a test system.

    • I agree with you that logging is very important!
      The point that I made here – and thats what we see with our customers that use our performance management solution is that applications often have too much logging built-in. Logging that in fact produces lots of overhead – which is not what you want.
      A modern Performance Management Tool will automatically trace every single request that comes into your system – such as you can see from the screenshots above. That eliminates the need for putting additional logging into your application code.
      But – dont get me wrong – custom log output is very important. What most folks don’t realize is that too much logging a) impacts application performance and b) makes it hard to find the right information when troubleshooting.
      Our dynaTrace solution for instance captures your custom log messages in the context of every single request/transaction. With that we eliminate the need to put in “default” logging mechanisms as you explained – so that you can focus on only logging the information that is really important. Very often we see that people can get rid of most of their custom logging because things like database queries, exceptions, web service calls, … are automatically captured by a performance management tool anyway.

      hope this makes sense

  2. I completely agree with this article.

    We have a legacy app that logs so much info that finding anything useful is a tedious process. Looking for clues is like searching for a needle in a haystack. Not to add the problems with running out of space in the server.

    While some additional logging may be helpful when an application is brand new, mature apps should log only when there is any sort of error or possible exception situations.

    We have an app that was developed about a year ago, that had logging like Frisian suggests in the first release as it was a brand new app. Over subsequent releases, the amount of logging was reduced such that now there is only a single line logged for a successful transaction.

    Additional logging happens whenever there is any sort of exception or errors or warnings .. or possible ambiguous situations happens to help with debugging.

    So far it has worked well for us.

    sgb

  3. So true! I was working with a client a few weeks ago who was about to move to production, and during performance tests we kept finding bottlenecks on their logging. The time the threads are spending doing I/O to disk consume resources that other business logic needs during high volumes of transactions.

  4. Log files are great but sometimes they can become so big it is hard to find things and they take up a lot of space. I have seen log files cause servers to run out of disk. Reducing the size of log files and ensuring they get purged appropriately is important.

Comments

*


8 − six =