Gary Kaiser About the Author

Gary is a Subject Matter Expert in Network Performance Analysis at Compuware APM. He has global field enablement responsibilities for performance monitoring and analysis solutions embracing emerging and strategic technologies, including WAN optimization, thin client infrastructures, network forensics, and a unique performance management maturity methodology. He is also a co-inventor of multiple analysis features, and continues to champion the value of software-enabled expert network analysis.

Understanding Application Performance on the Network – Part II: Bandwidth and Congestion

When we think of application performance problems that are network-related, we often immediately think of bandwidth and congestion as likely culprits; faster speeds and less traffic will solve everything, right? This is reminiscent of recent ISP wars; which is better, DSL or cable modems? Cable modem proponents touted the higher bandwidth while DSL proponents warned of the dangers of sharing the network with your potentially bandwidth-hogging neighbors. In this blog entry, we’ll examine these two closely-related constraints, beginning the series of performance analyses using the framework we introduced in Part I. I’ll use graphics from Compuware’s application-centric protocol analyzer – Transaction Trace – as illustrations.

Bandwidth

We define bandwidth delay as the serialization delay encountered as bits are clocked out onto the network medium. Most important for performance analysis is what we refer to as the “bottleneck bandwidth” – the speed of the link at its slowest point – as this will be the primary influencer on the packet arrival rate at the destination. Each packet incurs the serialization delay dictated by the link speed; for example, at 4Mbps, a 1500 byte packet takes approximately 3 milliseconds to be serialized. Extending this bandwidth calculation to an entire operation is relatively straightforward. We observe (on the wire) the number of bytes sent or received and multiply that by 8 bits, then divide by the bottleneck link speed, understanding that asymmetric links may have different upstream and downstream speeds.

Bandwidth effect = [ [# bytes sent or received] x [8 bits] ]/ [Bottleneck link speed]

For example, we can calculate the bandwidth effect for an operation that sends 100KB and receives 1024KB on a 2048Kbps link:

  • Upstream effect: [100,000 * 8] / 2,048,000] = 390 milliseconds
  • Downstream effect: [1,024,000 *8] / 2,048,000] = 4000 milliseconds

For better precision, you should account for frame header size differences between the packet capture medium – Ethernet, likely – and the WAN link; this difference might be as much as 8 or 10 bytes per packet.

Bandwidth constraints impact only the data transfer periods within an operation– the request and reply flows. Each flow also incurs (at a minimum) additional delay due to network latency, as the first bit traverses the network from sender to receiver; TCP flow control or other factors may introduce further delays. (As an operation’s chattiness increases, its sensitivity to network latency increases and the overall impact of bandwidth tends to decrease, becoming overshadowed by latency.)

Transaction Trace Illustration: Bandwidth

One way to frame the question is “does the operation use all of the available bandwidth?” The simplest way to visualize this is to graph throughput in each direction, comparing uni-directional throughput with the link’s measured bandwidth. If the answer is yes, then the operation bottleneck is bandwidth; if the answer is no, then there is some other constraint limiting performance. (This doesn’t mean that bandwidth isn’t a significant, or even the dominant, constraint; it simply means that there are other factors that prevent the operation from reaching the bandwidth limitation. The formula we used to calculate the impact of bandwidth still applies as a definition of the contribution of bandwidth to the overall operation time.)

Measured Bandwidth

This FTP transfer is frequently limited by the 10Mbps available bandwidth.

Networks are generally shared resources; when there are multiple connections on a link, TCP flow control will prevent a single flow from using all of the available bandwidth as it detects and adjusts for congestion. We will evaluate the impact of congestion next, but fundamentally, the diagnosis is the same; bandwidth constrains throughput.

Congestion

Congestion occurs when data arrives at a network interface at a rate faster than the media can service; when this occurs, packets must be placed in an output queue, waiting until earlier packets have been serviced. These queue delays add to the end-to-end network delay, with a potentially significant effect on both chatty and non-chatty operations. (Chatty operations will be impacted due to the increase in round-trip delay, while non-chatty operations may be impacted by TCP flow control and congestion avoidance algorithms.)

For a given flow, congestion initially reduces the rate of TCP slow-start’s ramp by slowing increases to the sender’s Congestion Window (CWD); it also adds to the delay component of the Bandwidth Delay Product (BDP), increasing the likelihood of exhausting the receiver’s TCP window. (We’ll discuss TCP slow-start as well as the BDP later in this series.)

As congestion becomes more severe, the queue in one of the path’s routers may become full. As packets arrive exceeding the queue’s storage capacity, some packets must be discarded. Routers employ various algorithms to determine which packets should be dropped, perhaps attempting to distribute congestion’s impact among multiple connections, or to more significantly impact lower-priority traffic. When TCP detects these dropped packets (by a triple-duplicate ACK, for example), congestion is the assumed cause. As we will discuss in more depth in an upcoming blog entry, packet loss causes the sending TCP to reduce its Congestion Window by 50%, after which slow-start begins to ramp up again in a relatively conservative congestion avoidance phase.

Transaction Trace Illustration: Congestion

We know that a network path has some minimum amount of delay, in theory based purely on distance and route processing; we define that as path latency. Any delay above this amount can be attributed to congestion. (While we generally consider congestion to be related to link utilization’s impact on router queues, it can also be introduced by processing delays; for example, a busy firewall may experience a delay in examining a packet, adding to end-to-end delay and to our definition – and measurement – of congestion.)

The most accurate method of measuring congestion from a packet trace is to capture at both client and server locations, then merge the two trace files together using Transaction Trace’s remote merge function. This approach assures accurate send and receive timestamps for every packet. We can then analyze transit time over the course of the operation; transit times above the minimum observed value (greater than the path latency) are presumed to be caused by congestion. We generally make the assumption that the minimum observed transit time in a merged task is equal to the “idle” path delay – in other words, at least one packet in the trace succeeded in traversing the network without encountering any significant congestion. This is a reasonable assumption, as the goal is not to calculate precisely the impact of congestion, but rather to prove that congestion is an important contributing bottleneck by estimating its effect.

To illustrate congestion, use the Time Plot view to graph packet transit times, comparing the delta between minimum, average and maximum delays. You may find very short bursts of congestion affecting only a small handful of packets, or perhaps more consistent congestion that affects most of the packets for a flow or operation. The two Time Plot graphs below illustrate these conditions.

Time Plot view showing minimal congestion by graphing packet transit times; average transit time is 103 milliseconds, only three milliseconds above the minimum path latency.

Time Plot view showing minimal congestion by graphing packet transit times; average transit time is 103 milliseconds, only three milliseconds above the minimum path latency.

Time Plot View showing severe congestion; path latency is five milliseconds, average transit time is 141 milliseconds, and peak transit time exceeds 1000 milliseconds.

Time Plot View showing severe congestion; path latency is five milliseconds, average transit time is 141 milliseconds, and peak transit time exceeds 1000 milliseconds.

Corrective Actions: Bandwidth and Congestion Constraints

Addressing a pure bandwidth constraint is straightforward; the physical (i.e., infrastructure) solution is to increase bandwidth, while the logical (i.e., application) solution is to decrease the amount of data transferred. Data compression is a method for the latter that has been around for decades, and more recent WAN optimization approaches offer further options for data reduction. Caching, interface simplification, and thin client solutions may also provide relief.

Similarly, addressing congestion can be as simple as increasing bandwidth. Alternatively, you may take a more studied approach, identifying and classifying the traffic that contends for bandwidth. QoS policies may be used to mitigate the impact of congestion on time-sensitive applications, effectively allocating more bandwidth to important traffic by limiting the rate of less-critical traffic. And of course you may find cases of incorrectly routed (or forbidden) traffic in unexpected places.

How do you monitor, report and manage congestion in your network?

In an upcoming post in this 8 part series, we’ll look at the impact of packet loss, which is of course quite closely related to bandwidth and congestion constraints. But next, in Part III, we’ll discuss TCP slow-start, introduce the Congestion Window, and illustrate how these are used to control the sender’s transmission rate. Stay tuned and feel free to comment below.

Comments

  1. Gary,

    I really look forward to these articles, these make me think.

    I think you have made a good case for BW but you have understated the importance or effect of latency. With the way that TCP works, ACKing every second packet, as you increase the latency you drop the data throughput. There is a calculation you can do that is assuming you have unlimited BW –

    TCP send buffer size / latency will give you actual throughput.

    Let’s assume we have a server with a 16K byte send buffer, BTW this was the default sent buffer for an AIX server until the P7 series. With the server within the US, say 30ms away.

    16384 x8 / .03 = 4.36Mbps

    If we move that server to India with a network loop delay (latency) of 300ms the effect is,

    16384 x 8 / .3 = 436Kbps

    This is with no congestion or BW limitation.

    I will wait for the discussion on TCP slow start as I do not understand how congestion is directly effecting the sender TCP window. The sending station can only sent what is being advertised by the receiving station from memory, and yes I can see that congestion can effect slow start but not as you explained.

    I would also say that not all TCP stacks will reduce their receive window to 50%, the early implementation of TCP reduced it to zero, or 1 x MSS, then started a normal slow start. If there was more congestion and dropped packets then it would again drop to zero and start a very conservative slow start, only increasing by a single packet per round trip.

    I know it is a big topic but I think you glossed over compression and how easy it is to implement. All web servers based on Apache have 9 levels of Gzip compression. In testing we found that L3 gave around a 65% reduction in traffic with no measurable CPU load. A lot of web servers then implemented L9, achieving close to 85% traffic reduction for about a 6% load. More than worth it. The thing I keep pointing to the customers is that compression is implemented before SSL so all traffic is compressed as it leaves the web server. No real need for wan X box.

    Oh, I think you should point out that taking traces at both client and server have to happen at the same time so that you can compare. I have delt with support people that ran the traces following each other.

    Thanks

    Chris

  2. Gary Kaiser Gary Kaiser says:

    Hi Chris,
    No apology necessary; I look forward to your comments, and try to cover some of the finer details regarding the congestion window, the receive window, and TCP slow-start as the series continues.
    I am limited somewhat by acceptable blog entry size, so can’t cover many of the permutations of TCP stack behavior; instead, I try to illustrate the general (maybe most common) behaviors from which these permutations are still recognizable.

  3. Hello Gary, while troubleshooting issue related to a single user complaint, how do we mirror this approach. Let us say I have a wireshark trace of a user in a branch office with 10MB connection. Now this 10MB is being used by many other users in the branch,so what should be the bandwidth used for analysis ? Wouldn’t it be incorrect to take 10 MB as internet connection while comparing throughput?

  4. Gary Kaiser Gary Kaiser says:

    Hi Sharat,
    Thanks for your question. The approach I use in the triage framework would use 10Mbps as the limiting bandwidth in your example; even though other users share the link, each packet in a single user’s session will be clocked at this rate. But you’re correct in that throughput will be limited by not just this clocking rate, but also potentially by other factors – such as congestion (from other users’ traffic), latency, TCP window size, slow-start, etc. I like to quantify the impact of the clocking rate first, then start looking for other limiting factors.
    I don’t mean to imply that this framework is the only “correct” approach to analyzing a user’s application performance; it is, however, intended to be a repeatable, methodical and supportable process.

  5. Thanks Gary. I agree and am receptive to your approach. It can save lot of time analyzing packets.

Comments

*


6 − = four