Cassandra Write Performance – A quick look inside
I was looking at Cassandra, one of the major NoSQL solutions, and I was immediately impressed with its write speed even on my notebook. But I also noticed that it was very volatile in its response time, so I took a deeper look at it.
First Cassandra Write Test
I did the first write tests on my local machine, but I had a goal in mind. I wanted to see how fast I could insert 150K data points each consisting of 3 values. In Cassandra terms this meant I added 150K of rows in a single Column Family and adding three columns each time. Don’t be confused with the term column here, it really means a key/value pair. At first I tried to load the 150K in one single mutator call. It worked just fine, but I had huge GC suspensions. So I switched to sending 10K buckets. That got nice enough performance. Here is the resulting response time chart:
The upper chart shows client and server response time respectively. This indicates that we leave a considerable time either on the wire or in the client. The lower chart compares average and maximum response time on the Cassandra server, clearly showing a high volatility. So I let dynaTrace do its magic and looked at the Transaction Flow to check the difference between client and server response time.
Getting a look at the insides of Cassandra
This is what I got 5 minutes after I first deployed the dynaTrace agent. It shows that we do indeed leave a large portion of the time on the wire, either due to network or waiting for Cassandra. But the majority is still on the Server. A quick check of the response time hotspots reveals even more:
The Hotspots show that most of the time on the Cassandra Server is spent in CPU and I/O, as it should be, but a considerable portion is also attributed to GC Suspension. Please note that this is not Garbage Collection Time spent, but the time that my transactions were actively suspended by the Garbage Collector (read about the difference here)! What is also interesting is that a not so insignificant portion is spent inside Thrift, the communication protocol of Cassandra, which confirmed the communication as part of the issue. Another thing that is interesting is that the majority of the transactions is in the 75ms range (as can be seen in the upper right corner), but a lot of transactions are slower and some go all the way up to 1.5 seconds.
I looked at the slowest 5% and could see that GC suspension plays a much bigger role here and that the time we spend waiting on I/O is also greatly increased. So the next thing I checked was Garbage Collection, always one of my favorites.
What we see here is a phenomenon that I have blogged about before. The GC suspensions are mostly due to the so called “minor collection”. Major collections do happen, but are only responsible for two of the suspensions. If I had only monitored Major GCs I would not have seen the impact on my performance. What it means is that Cassandra is allocating a lot of objects and my memory setup couldn’t keep up with it, not very surprising with 150K of data every 10 seconds.
Finally I took a look at the single Transactions themselves:
What we see here is that the PurePath follows the batch_mutate call from the client to the server. This allows us to see that it spends a lot of time between the two layers (The two synchronization nodes indicate start and end of the network call). More importantly we see that we only spend about 30ms CPU in the client side batch_mutate function and according to the elapsed time this all happened during sending. That means that either my network was clogged or the Cassandra Server couldn’t accept my request quick enough. We also see that the majority of the time on the server is spent waiting on the write. That did not surprise me as my disk is not the fastest.
A quick check on the network interface showed me that my test (10x150K rows) accumulated to 300MB of data, being quick in math this told me that a single batch_mutate call sent roughly 2MB of data over the wire, so we can safely assume that the latency is due to network. It also means that we need to monitor network and Cassandras usage of it closely.
Checking the Memory
I didn’t find a comprehensive GC tuning guide for Cassandra and didn’t want to invest a lot of time, so I took a quick peek to get an idea about the main drivers for the obvious high object churn:
What I saw was pretty conclusive. The Mutation creates a ColumnFamily and a Column Object for each single column value that I insert. More importantly the ColumnFamily holds a ConcurrentSkipListMap which keeps track of the modified columns. That produced nearly as many allocations as any other primitive, something I have rarely seen. So I immediately found the reasons for the high object churn.
NoSQL or BigData Solutions are very very different from your usual RDBMS, but they are still bound by the usual constraints: CPU, I/O and most importantly how it is used! Although Cassandra is lighting fast and mostly I/O bound it’s still Java and you have the usual problems – e.g. GC needs to be watched. Cassandra provides a lot of monitoring metrics that I didn’t explain here, but seeing the flow end-to-end really helps to understand whether the time is spent on the client, network or server and makes the runtime dynamics of Cassandra much clearer.
Understanding is really the key for effective usage of NoSQL solutions as we shall see in my next blogs. New problem patterns emerge and they cannot be solved by simply adding an index here or there. It really requires you to understand the usage pattern from the application point of view. The good news is that these new solutions allow us a really deep look into their inner workings, at least if you have the right tools at hand.