Andreas Grabner About the Author

Andreas Grabner has been helping companies improve their application performance for 15+ years. He is a regular contributor within Web Performance and DevOps communities and a prolific speaker at user groups and conferences around the world. Reach him at @grabnerandi

When DB Log files don’t help: Data Access Analysis Tips to boost Performance

If transaction times jump from less than one second to more than 60 seconds after a software upgrade, it is evident that there is a problem that needs investigation. This scenario was experienced by a customer with its time-tracking application. Our customer identified this performance regression while executing extensive performance tests prior to its production upgrade. The company’s “traditional” approach of analyzing these types of problems was analyzing application and database log files as well as infrastructure metrics such as CPU, Memory and Network Throughput. Our customer couldn’t find a single slow database query, nor were there any indications of an error in the log files. The infrastructure also seemed to be fine because the Application, the Database Server and the Network didn’t show any bottlenecks. Analyzing the End-to-End Transaction, such as the click on search, however revealed a problem which turned out to be a “data-driven” performance issue.

Problem: Loading and Processing Large Result Sets

The following screenshot shows parts of the PurePath highlighting where most of the time is spent within the transaction:

60s is consumed by loadDataStore which iterates through a large ResultSet

60s is consumed by loadDataStore which iterates through a large ResultSet

The SQL Statement to retrieve time tracking data executed in about 840ms. This doesn’t really indicate a hotspot for this transaction. loadDataStore however iterates through the ResultSet of this query. Looking at the PurePath details of that method execution shows that the continuous fetches of the next result sets is the real hotspot of this transaction. Most of the time is spent in I/O to retrieve the next batch of table rows and process this data.

Root Cause: Too Much Data for Test User

After consulting with the 3rd-party vendor of this application our customer came to the conclusion that the user account used in its load testing script had elevated privileges. When accessing a particular web page in the application this user was able to see time tracking data from all other users in the system. The implementation of this web page was such that it processed all data accessible by the user – instead of just accessing the data needed for display, e.g: for the first page of results.

Solution: “Real” User Account for Testing

Instead of using the special privileged user account the test scripts were modified to use a user account with “normal” privileges – this also reflected the real-life scenario. This reduced the transaction response time significantly—down from 61 seconds to 73 milliseconds– as shown in the following PurePath:

Standard Users process the same request in 73ms instead of 61s

Standard Users process the same request in 73ms instead of 61s

Conclusion

Unsurprisingly, the amount of data that is returned by a SQL Statement can have significant impact on transaction performance. There are however several lessons learned:

  1. SQL Log Files didn’t show this as a problem as they only reported the SQL Execution Time but not the individual Fetch and Processing times
  2. Testing needs to be done with real user accounts and not test, dummy or special privileged accounts
  3. Optimize access to large result sets by only processing what is really needed for display

If you are interested in more typical problem patterns check out the Top 8 Application Performance Landmines.

Comments

  1. Great article. This also points out a need for a real world data load and possibly a projected data load to accompany your real users. I have seen cases where the initial data load performs great but over time it creeps up and ruins your SLA or can even tank poorly written code.

  2. Vitaly Grinberg says:

    Great article. Which sensor rulles should be added to monitor rs processing with org.apache.commons.dbcp

    • Good News is that Auto Sensors will pick all of the heavy result set processing methods up automatically. So – I would look at your PurePaths and see what Auto Sensors can tell you. If you still believe you need more insight you can create your custom sensor rules right from the Auto Sensor Nodes. Also – feel free to get more feedback and ideas on that by posting your question on our community forum: http://community.compuwareapm.com. Andi

  3. Friedrich says:

    Very interesting use case about using the wrong and right measurment tools, using the right data for testing and the importance of end-to-end metrics!
    As an (Oracle) database performance guy, I wonder about the part: “SQL Log Files didn’t show this as a problem as they only reported the SQL Execution Time but not the individual Fetch and Processing times”. A “SQL Log” with timing information (?) NOT including the fetch part of a query is – from a performance point of view – rather useless, IMHO.

  4. Also, interestingly enough, the very next day after reading this article we had a request for some deeper analysis of slow response times. It turned out the issue is similar. Long story short, it’s an issue with vendor code. A normal request happened sub second, a problem request took over 20s. This could happen to the same user, in the same session.

    The issue? Database calls and poorly written logic on the application side. One page view sparked 10,000 queries. In total there were 21(!) unique. Each query took about 1-2ms to run. So, this was a case of a death by a thousand cuts.

    I showed the comparison of the Database and API Breakdown, along with the transaction flow dashlets to help present the case for a code change.

Comments

*


nine + 1 =