Michael Kopp About the Author

Michael is aTechnical Product Manager at Compuware. Reach him at @mikopp

How I Identified a MongoDB Performance Anti Pattern in 5 Minutes

The other day I was looking at a web application that was using MongoDB as its central database. We were analyzing the application for potential performance problems and inside 5 minutes I detected what I must consider to be a MongoDB anti pattern and had a 40% impact on response time. The funny thing: It was a Java best practice that triggered it!

Analyzing the Application

The first thing I always do is look at the topology of an application to get a feel for it.

Overall Transaction Flow of the Application

Overall Transaction Flow of the Application

As we see it’s a modestly complex web application and it’s using MongoDB as its datastore. Overall MongoDB contributes about 7% to the response time of the application. I noticed that about half of all transactions are actually calling MongoDB so I took a closer look.

Flow of Transactions that access MongoDB, showing 10% response time contribution of MongoDB

Flow of Transactions that access MongoDB, showing 10% response time contribution of MongoDB

Those transactions that actually do call MongoDB spend about 10% of their response time in that popular document database. As a next step I wanted to know what was being executed against MongoDB.

Overview of all MongoDB commands. This shows that the JourneyCollection find and getCount contribute the most to response time

Overview of all MongoDB commands. This shows that the JourneyCollection find and getCount contribute the most to response time

One immediately notices the first two lines, which contribute much more to the response time per transaction than all the others. What was interesting was that the getCount on the JourneyCollection had the highest contribution time, but the developer responsible was not aware that he was even using it anywhere.

Things get interesting – the mysterious getCount call

Taking things one level deeper, we looked at all transactions that were executing the ominous getCount on the JourneyCollection.

Transactions that call JourneyCollection.getCount spend nearly half their time in MongoDB

Transactions that call JourneyCollection.getCount spend nearly half their time in MongoDB

What jumps out is that those particular transactions spend indeed over 40% of their time in MongoDB, so there was a big potential for improvement here. Another click and we looked at all MongoDB calls that were executed within the context of the same transaction as the getCount call we found so mysterious.

All MongoDB Statements that run within the same transaction context as the JourneyCollection.getCount

All MongoDB Statements that run within the same transaction context as the JourneyCollection.getCount

What struck us as interesting was that the number of executions per transaction of the find and getCount on the JourneyCollection seemed closely connected. At this point we decided to look at the transactions themselves – we needed to understand why that particular MongoDB call was executed.

Single Transactions that execute the ominous getCount call

Single Transactions that execute the ominous getCount call

It’s immediately clear that several different transaction types are executing that particular getCount. What that meant for us is that the problem was likely in the core framework of that particular application rather than being specific to any one user action. Here is the interesting snippet:

The Transaction Trace shows where the getCount is executed exactly

The Transaction Trace shows where the getCount is executed exactly

We see that the WebService findJourneys spends all its time in the two MongoDB calls. The first is the actual find call to the Journey Collection. The MongoDB client is good at lazy loading, so the find does not actually do much yet. It only calls the server once we access the result set. We can see the round trip to MongoDB visualized in the call node at the end.

We also see the offending getCount. We see that it is executed by a method called size which turns out to be com.mongodb.DBCursor.size method. This was news to our developer. Looking at several other transactions we found that this was a common pattern. Every time we search for something in the JourneyCollection the getCount would be executed by com.mongodb.DBCursor.size. This always happens before we would really execute the send the find  command to the server(which happens in the call method). So we used CompuwareAPM DTM’s (a.k.a dynaTrace) developer integration and took a look at the offending code. Here is what we found:

BasicDBObject fields = new BasicDBObject();
fields.put(journeyStr + "." + MongoConstants.ID, 1);
fields.put(MongoConstants.ID, 0);

Collection locations = find(patternQuery, fields);

ArrayList results = new ArrayList(locations.size());
for (DBObject dbObject : locations) {
String loc = dbObject.getString(journeyStr);
results.add(loc);
}
return results;

The code looks harmless enough; we execute a find, create an array for the result and fill it. The offender is the location.size(). MongoDBs DBCursor is similar to the ResultSet in JDBC, it does not return the whole data set at once, but only a subset. As a consequence it doesn’t really know how many elements the find will end up with. The only way for MongoDB to determine the final size seems to be to execute a getCount with the same criteria as the original find. In our case that additional unnecessary roundtrip made up 40% of the web services response time!

An Anti-Patter triggered by a Best Practice

So it turns out that calling size on the DBCursor must be considered an anti-pattern! The real funny thing is that the developer thought he was writing performant code. He was following the best practice to pre-size arrays. This avoids any unnecessary re-sizing ;-) . In this particular case however, that minor theoretical performance improvement led to a 40% performance degradation!

Conclusion

The take away here is not that MongoDB is bad or doesn’t perform. In fact the customer is rather happy with it. But mistakes happen and similar to other database applications we need to have the visibility into a running application to see how much it contributes to the overall response time. We also need to have that visibility to understand which statements are called where and why.

In addition this also demonstrates nicely why premature micro optimization, without leveraging an APM solution, in production will not lead to better performance. In some cases – like this one – it can actually lead to worse performance!

Comments

  1. Hello,

    Nice article. I would like to ask you something: What is the name of the tool you used to extract the Transaction Flow and Performance of your application? The one that is featured in the first couple of screenshots.

    Thank you

  2. Hi Dinish,

    It’s our own of course ;-) : CompuwareAPM dynaTrace Enterprise.

    The Transaction Flow is discovered automatically as the application executes…

    Thanks
    Mike

  3. Hi Mike,
    Interesting thing indeed to watch out. Preference probably would be to have locations as embedded array in Journey collection’s document. As long as it fits into MongoDB document size limit. One-to-few kind of pattern.
    By the way, are you using specific plugin for dynaTrace? I don’t recall seeing MongoDB stats in the tool.

  4. Hi Eugene,

    Its a new feature. If you are a customer, send me an email if you want to know more details about it.

  5. I dont see any anti pattern here or fault on the developer side either. The problem is the MongoDB client API. They should have never provided “DBCursor.count” method and had an implementation that silently makes another call to remote database. Even Javadoc does not mention that: “Counts the number of elements in this cursor.”.

  6. HighJustice says:

    I like that topic. Thanks a lot :)

  7. @Toslak,

    You are right in that the usage of the DBCursor count is the anti poattern. The fact that it is an anti pattern of using the MongoDB Client does not mean that the Developer has written bad code, as explained he was sticking to best practice.

    So yes, the problem is with the Client API, or rather the way it was used here. That usage is the anti pattern.

  8. Peter Verhas says:

    First of all, the article was very interesting and liked it especially that you described the way you got to the conclusion and not only the result. Very valuable.

    To decide who to blame is a bit academic, but not pointless.

    Common sense says that the developer was developing a code that was performing bad and thus had to optimize it. In this case it was you, who helped but from the major point of you it is irrelevant. We have tools we have to work with and life is not perfect. If you utilize MongoDB you have to use the MongoDB client library. That is the only choice. Prepare yourself: tools are not perfect and do not always perform the way they promise.

    Experts on the other hand explain that the flaw is on MongoDB side and at least the API doc is bad not describing the side effects of calling size(). I understand their point, but in no way will MongoDB developers fix your application. That is one point. The other point is to keep yourself remembering your misery, when you design your API and craft your own documentation. They will also have users.

  9. Michael, do you mean that there is now MongoDB tooling in DynaTrace? We’re clients and would certainly like to hear more. I can’t seem to find your email, can you send me the details?

Comments

*


nine − 5 =