Michael Kopp About the Author

Michael is aTechnical Product Manager at Compuware. Reach him at @mikopp

Cassandra Write Performance – A quick look inside

I was looking at Cassandra, one of the major NoSQL solutions, and I was immediately impressed with its write speed even on my notebook. But I also noticed that it was very volatile in its response time, so I took a deeper look at it.

First Cassandra Write Test

I did the first write tests on my local machine, but I had a goal in mind. I wanted to see how fast I could insert 150K data points each consisting of 3 values. In Cassandra terms this meant I added 150K of rows in a single Column Family and adding three columns each time. Don’t be confused with the term column here, it really means a key/value pair. At first I tried to load the 150K in one single mutator call. It worked just fine, but I had huge GC suspensions. So I switched to sending 10K buckets. That got nice enough performance. Here is the resulting response time chart:

Cassandra Client/Server Performance and Volatility

Cassandra Client/Server Performance and Volatility

The upper chart shows client and server response time respectively. This indicates that we leave a considerable time either on the wire or in the client. The lower chart compares average and maximum response time on the Cassandra server, clearly showing a high volatility. So I let dynaTrace do its magic and looked at the Transaction Flow to check the difference between client and server response time.

Getting a look at the insides of Cassandra

batch_mutate Transactions from Client to Cassandra Server

batch_mutate Transactions from Client to Cassandra Server

This is what I got 5 minutes after I first deployed the dynaTrace agent. It shows that we do indeed leave a large portion of the time on the wire, either due to network or waiting for Cassandra. But the majority is still on the Server. A quick check of the response time hotspots reveals even more:

This shows that most of the time spent on the server is CPU and I/O

This shows that most of the time spent on the server is CPU and I/O

The Hotspots show that most of the time on the Cassandra Server is spent in CPU and I/O, as it should be, but a considerable portion is also attributed to GC Suspension. Please note that this is not Garbage Collection Time spent, but the time that my transactions were actively suspended by the Garbage Collector (read about the difference here)! What is also interesting is that a not so insignificant portion is spent inside Thrift, the communication protocol of Cassandra, which confirmed the communication as part of the issue. Another thing that is interesting is that the majority of the transactions is in the 75ms range (as can be seen in the upper right corner), but a lot of transactions are slower and some go all the way up to 1.5 seconds.

Hotspots of the slowest 5% of the batch_mutate calls

Hotspots of the slowest 5% of the batch_mutate calls

I looked at the slowest 5% and could see that GC suspension plays a much bigger role here and that the time we spend waiting on I/O is also greatly increased. So the next thing I checked was Garbage Collection, always one of my favorites.

The charts show that nearly all GC suspensions are due to minor collections

The charts show that nearly all GC suspensions are due to minor collections

What we see here is a phenomenon that I have blogged about before. The GC suspensions are mostly due to the so called “minor collection”. Major collections do happen, but are only responsible for two of the suspensions. If I had only monitored Major GCs I would not have seen the impact on my performance. What it means is that Cassandra is allocating a lot of objects and my memory setup couldn’t keep up with it, not very surprising with 150K of data every 10 seconds.

Finally I took a look at the single Transactions themselves:

Single batch_mutate Transcations, each inserting 10K Rows

Single batch_mutate Business Transcations, each inserting 10K Rows

What we see here is that the PurePath follows the batch_mutate call from the client to the server. This allows us to see that it spends a lot of time between the two layers (The two synchronization nodes indicate start and end of the network call). More importantly we see that we only spend about 30ms CPU in the client side batch_mutate function and according to the elapsed time this all happened during sending. That means that either my network was clogged or the Cassandra Server couldn’t accept my request quick enough. We also see that the majority of the time on the server is spent waiting on the write. That did not surprise me as my disk is not the fastest.

A quick check on the network interface showed me that my test (10x150K rows) accumulated to 300MB of data, being quick in math this told me that a single batch_mutate call sent roughly 2MB of data over the wire, so we can safely assume that the latency is due to network. It also means that we need to monitor network and Cassandras usage of it closely.

Checking the Memory

I didn’t find a comprehensive GC tuning guide for Cassandra and didn’t want to invest a lot of time, so I took a quick peek to get an idea about the main drivers for the obvious high object churn:

The Memory Trend shows the main drivers

The Memory Trend shows the main drivers

What I saw was pretty conclusive. The Mutation creates a ColumnFamily and a Column Object for each single column value that I insert. More importantly the ColumnFamily holds a ConcurrentSkipListMap which keeps track of the modified columns. That produced nearly as many allocations as any other primitive, something I have rarely seen. So I immediately found the reasons for the high object churn.

Conclusion

NoSQL or BigData Solutions are very very different from your usual RDBMS, but they are still bound by the usual constraints: CPU, I/O and most importantly how it is used! Although Cassandra is lighting fast and mostly I/O bound it’s still Java and you have the usual problems – e.g. GC needs to be watched. Cassandra provides a lot of monitoring metrics that I didn’t explain here, but seeing the flow end-to-end really helps to understand whether the time is spent on the client, network or server and makes the runtime dynamics of Cassandra much clearer.

Understanding is really the key for effective usage of NoSQL solutions as we shall see in my next blogs. New problem patterns emerge and they cannot be solved by simply adding an index here or there. It really requires you to understand the usage pattern from the application point of view. The good news is that these new solutions allow us a really deep look into their inner workings, at least if you have the right tools at hand.

Comments

  1. Can you do the same with MongoDB?

    Thanks
    Robert Baminger

    P.S.: Greetings to Andi Grabner (my old Fabasoft/Segue friend)!!!

  2. Hi Robert,

    MongoDB is C++ so we cannot go as deep as easy as with Cassandra. We can of course see the driver side of things so similar to a jdbc driver. We do have C++ support via ADK but that requires very small code changes. But with MongoDB being opensource you could easily build in a compile switch.

    Mike

  3. James Earle says:

    What did you use to determine the 300MB accumulated on the network interface?

  4. We take in host metrics, in this case network the throghput. this told me the network load for the period.

  5. What tool did you use to obtain the above stats? Thanks!

  6. I used dynaTrace to obtain all the stats and all the screenshots are directly out of that tool: http://www.dynatrace.com

  7. Really interesting article. One thing bothered me a bit, however; your comparison of average to maximum response times might be flawed by some per-batch anomaly. Although it takes a bit more time, might be better to look at, for example, one standard deviation from he mean, and throw away the tails.

    Looking forward to reading more of your posts — thanks!

  8. I agree, it just was faster this way. I would normally use percentiles, something like overlaying 50th,75th and 90th percentile. Which in my opinion is even better than average and std dev.

  9. I’m interested in learning about Cassandra in relation to a variety of cloud computing platforms and its ability to process high volumes of data.

    While the specs presented in this article certainly portray Cassandra as a sleek, nimble performer, it seems there are certain tweaks that dictate this higher functionality. Like the article states, a deeper understanding of the front to end processing and usage patterns is really the best way to optimize performance.

  10. Hi Henry,

    Cassandra is ideally suited for a cloud computing platform as it is all peer to peer (no master, no SPoF), highly distributed and of course failover. The key for processing high volumes is a thorough understanding of how it structures data and how writing works, and of course being able to shard data. You might want to look at http://www.datastax.com/docs/1.0/index, these guys have the most information.

    Specific to cloud I would say the most important thing is disk and network I/O. e.g. I would refrain from using EBS in EC2, but rather use local storage and have replicas (which you want to have anyway).

    In addition you have to structure your client code accordingly to optimize writing and reading, aka minimize network roundtrips and bulk write/reads. I found that in EC2 (not sure about Rackspace) network latency is often a higher contribute than the actual cassandra server call.

    I would be very interested in your use case, if you are willing to share send me an email (michael.kopp@dynatrace.com)

  11. It takes a bit more time, might be better to look at, for example, one standard deviation from he mean, and throw away the tails. Looking forward to reading more of your posts.
    Vicodin dosage

Comments

*


− three = 1