Michael Kopp About the Author

Michael is aTechnical Product Manager at Compuware. Reach him at @mikopp

The reason I don’t monitor connection pool usage

I have been working with performance sensitive applications for a long time now. As can be expected most of them have to use the database at one point or the other. So you inevitably end up having a connection pool. Now to make sure that your application is not suffering from waiting on connections you monitor the pool usage, but is that really helping? Too be honest not really…

How an application uses the connection pool

Most applications these days use connection pools implicitly. They get a connection, execute some statements and close it. The close call does not destroy the connection but put it back into a pool. The goal is to minimize the so called busy time. Under the hood most application servers refrain from putting a connection back into the pool until the transaction has been committed. For this reason it is a good practice to get the database connection as late as possible during a transaction. Again the goal is to minimize usage time, so that many application threads can share a limited number of connections.

All connection pools have a usage measure to determine if enough connections are available, or in other words to see if the lack of connections has a negative effect. However as a connection is occupied only for very short amounts of time, often fractions of a second, we would need to check the usage equally often to have a statistical significant chance of seeing the pool being maxed out under normal conditions.

Connection Pool usage if polled every 10 seconds

Connection Pool usage if polled every 10 seconds

In reality this is not done, as checking the pool to often (say several times a second) would lead to a lot of monitoring overhead. Most solutions check every couple of seconds and as a result we only see pool usage reaching 100% if it is constantly maxed out. If we were to track the usage on a continuous basis the result would look different:

Pool usage as seen if min/max and average are tracked continously instead of polled

Pool usage as seen if min/max and average are tracked continuously instead of polled

This means that by the time we see 100% pool usage with regular monitoring solutions we would already have a significant negative performance impact, or would we?

What does 100% pool usage really mean?

Actually it does not mean much. It means that all connections in the pool are in use, but not that any transactions are suffering performance problems due to this. In a continuous load scenario we could easily tune our setup to have 100% pool usage all the time and not have a single transaction suffering; it would be perfect.

However many use cases do not have a steady continuous load pattern and we would notice performance degradation long before that. Pool usage alone does not tell us anything; Acquisition Time does!

This shows the pool usage and the min/max acquisition time which is non-zero even though the pool is never maxed out

This shows the pool usage and the min/max acquisition time which is non-zero even though the pool is never maxed out

Most application servers and connection pools have a wait or acquisition metric that is far more interesting than pool usage. Acquisition time represents the time that a transaction has to wait for a connection from the pool. It therefore represents real actionable information. If it increases we do know for a fact that we do not have enough connections in the pool all the time (or that the connection pool itself is badly written). This measure can show significant wait time long before the average pool usage is anywhere close to 100%. But there is still a slight problem. The measure is still an aggregated average across the whole pool or more specifically all transactions. Thus while it allows us to understand whether or not there are enough connections overall, it does not enable us to identify which business transactions are impacted and by how much.

Measuring Acquisition Time properly

Acquisition time is simply the time it takes for the getConnection call to return. We can easily measure that inside our transaction and if we do that we can account it on a per business transaction basis and not just as an aggregate of the whole pool. This means we can determine exactly how much time we spend waiting for each transaction type. After all I might not care if I wait 10ms in a transaction that has an average response time of a second, but at the same time this would be unacceptable in a transaction type with 100ms response time.

The getConnection call as measured in a single transaction. It is 10 ms altough the pool average is 0.5ms

The getConnection call as measured in a single transaction. It is 10 ms altough the pool average is 0.5ms

We could even determine which transaction types are concurrently fighting over limited connections and understand outliers, meaning the occasional case when a transactions waits a relative long time for a connection, which would otherwise be hidden by the averaging affect.

Configuring the optimal Pool Size

Knowing how big to configure a pool upfront is not always is. In reality most people simply set it to an arbitrary number that they assume is big enough. In some high volume cases it might not be possible to avoid wait time all the times, but we can understand and optimize it.

There is a very easy and practical way to do this. Simply monitor the connection acquisition time during peak load hours. It is best if you do that on a per business transaction basis as described above. You want to pay special attention to how much it contributes to the response time. Make sure that you exclude those transactions from your analysis that do not wait at all, they would just skew your calculation.

If the average response time contribution to your specific business transaction is very low (say below 1%) than you can reasonably say that your connection pool is big enough. It is important to note that I am not talking about an absolute value in terms of milliseconds but contribution time! If that contribution time is too high (e.g. 5% or higher) you will want to increase your connection pool until you reach an acceptable value. The resulting average pool usage might be very low on average or close to 100%, it does not really matter!

Conclusion

The usefulness of a pool measure depends on the frequency of polling it. The more we poll it, the more overhead we add and in the end it is still only a guess. Impact measures like acquisition time are far more useful and actionable. It allows us to tune the connection pool to a point where it has no or or at least acceptable overhead when compared to response time. Like all impact measures it is best not to use the overall average, but to understand it in terms of contribution to the end user response time.

Comments

  1. Kiran Kumar says:

    Why bother measuring Acquisition and pool sizes. Is there any benefit of using less connections?
    After detailed analysis as suggested I come up with a figure say 10 users can get acceptable performance with 10 connections. What am I loosing if I set-up to use 50 connections?

    For eg when I increase the heap size from 1024 to 4096 I know I have to procure 3gb of ram. In case of database connections what is the additional cost (consequences ) needed for 40 idle connections.

  2. In a simple usecase with just 10 parallel uses then you are of course right, just set it up to 40 or 50.

    However

    - bigger setups will not be able to just quadruple the number of available connections as it will exhaust the resources of their database server or lead to unacceptable memory usage
    - each connection has its own statement pool and other memory using items attached to it. You will need much more memory for the same performance.
    - As I tried to show in the blog the pool usage can be way below the maximum whenever you look at the figure or just look at the average and you still have a noticeable performance impact that you will have a hard time figuring out.
    - not every connection pool is related to JDBC connections which are relatively cheap, think about thread pools, JMS Connection Pools, pools of mainframe socket connections…

    Lastly, what I try to advocate here is in case you have a performance issue, don’t trust the pool usage to make sure your problem is not in the connection pool area.

    PS, the cost of going from 1GB to 4GB is bigger GC pauses ;-)

  3. I think keeping the pool size higher(within safe limits) can also act as a measure of resiliency.In real time prod scenario’s its always a possibility that some system/interface cranks up causing threads to be released late than usual.In such scenario’s a safer high value of pool size may actually help the system to survive for some time till firefighting is done.

  4. Kiran Kumar says:

    @Michael Kopp, Thanks for the nice article and Thanks for the quick clarification. I have two challenges
    1) To demonstrate that I have optimum pool size
    2) (Contrast to above) When performance issue is observed, to demonstrate that pool size is not the cause.

    Your article has solved both challenges.

  5. @Vivek Agreed, your argument is correct. It also requires that you have sufficient monitoring in place to
    a) identify what a safe high value is
    b) identify the fact that it was the root cause for a problem, the reason for the firefight or had nothing to do with it.

    @Kumar Glad to be of service

  6. Yves Martin says:

    Because most hibernate-based applications use now a large set of different queries, the prepared statement cache is difficult to size too. As most (all ?) JDBC drivers manage a statement cache per connection, it is really easy to exhaust application server memory with (statement cache size * connection pool size) items.
    It is not rare that a prepared statement weights 5 Mb in memory.

    As a result, the “optimal connection pool size” allows you to increase your prepared statement cache !

    What I really expect for future is the ability for database servers – which hold the prepared statement’s execution plan, often thanks a cursor – to share cached statements in JDBC driver’s across all connections in pool.

    In my opinion, that would be a great improvement.

  7. @Martin thanks for bringing that up, I tried to hint at that, but you put it in much better terms.

    I agree with your desire that the statement pool should be shared. in fact the statement pool is shared on the server side of things. the jdbc side suffers from bad design, the preparedstatement objects often carry a lot of state!

  8. I use Tivoli/pmi metrics on Websphere to monitor and size connections pools effectively. For sudden surges in prod. always tune your pool sizes (jdbc/jms/resource adapters/web container,IOC layer etc) for a 2x,3x anticipated load. It’s better to be safe than sorry. And also over sizing max connections is generally a bad idea since the jvm has to maintain those connections and resources are unnecessarily utilized.

Comments

*


seven + 9 =