Gary Kaiser About the Author

Gary is a Subject Matter Expert in Network Performance Analysis at Compuware APM. He has global field enablement responsibilities for performance monitoring and analysis solutions embracing emerging and strategic technologies, including WAN optimization, thin client infrastructures, network forensics, and a unique performance management maturity methodology. He is also a co-inventor of multiple analysis features, and continues to champion the value of software-enabled expert network analysis.

Understanding Application Performance on the Network – Part III: TCP Slow-Start

In Part II, we discussed performance constraints caused by both bandwidth and congestion. Purposely omitted was a discussion about packet loss – which is often an inevitable result of heavy network congestion. I’ll use this blog entry on TCP slow-start to introduce the Congestion Window (CWD), which is fundamental for Part IV’s in-depth review of Packet Loss.

TCP Slow-Start

TCP uses a slow-start algorithm as it tries to understand the characteristics (bandwidth, latency, congestion) of the path supporting a new TCP connection. In most cases, TCP has no inherent understanding of the characteristics of the network path; it could be a switched connection on a high-speed LAN to a server in the next room, or it could be a low-bandwidth, already congested connection to a server halfway around the globe. In an effort to be a good network citizen, TCP uses a slow-start algorithm based on an internally-maintained congestion window (CWD) which identifies how many packets may be transmitted without being acknowledged; as the data carried in transmitted packets is acknowledged, the window increases. The CWD typically begins at two packets, allowing an initial transmission of two packets and then ramping up quickly as acknowledgements are received.

At the beginning of a new TCP connection, the CWD starts at 2 packets and increases as acknowledgements are received.

At the beginning of a new TCP connection, the CWD starts at 2 packets and increases as acknowledgements are received.

The CWD will continue to increase until one of three conditions is met:

Condition Determined by Blog discussion
Receiver’s TCP Window limit Receiver’s TCP Window size Part VII
Congestion detected (via packet loss) Triple Duplicate ACK Part IV
Maximum write block size Application configuration Part VIII

Generally, TCP slow-start will not be a primary or significant bottleneck. Slow-start occurs once per TCP connection, so for many operations there may be no impact. However, we will address the theoretical case of a TCP slow-start bottleneck, some influencing factors, and then present a real-world case.

The Maximum Segment Size and the CWD

The Maximum Segment Size (MSS) identifies the maximum TCP payload that can be carried by a packet; this value is set as a TCP option as a new connection is established. Probably the most common MSS value is 1460, but smaller sizes may be used to allow for VPN headers or to support different link protocols. Beyond the additional protocol overhead introduced by a reduced MSS, there is also an impact on the CWD, since the algorithm uses packets as its flow control metric.

We can consider the CWD’s exchanges of data packets and subsequent ACKs as TCP turns, or TCP round trips; each exchange incurs the round-trip path delay. Therefore, one of the primary factors influencing the impact of TCP slow-start is network latency. A smaller MSS value will result in a larger number of packets – and additional TCP turns – as the sending node increases the CWD to reach its upper limit. It is possible that with a small MSS (536 Bytes) and high path delay (200 msec) that slow-start might introduce 3 seconds of delay to an operation as the CWD increases to a receive window limit of 65KB.

How Important is TCP Slow-Start?

While significant, even a 3-second delay is probably not interesting for large file transfers, or for applications that reuse TCP connections. But let’s consider a simple web page with 20 page elements, averaging about 120KB in size. A misconfigured proxy server prevents persistent TCP connections, so we’ll need 20 new TCP connections to load the page. Each connection must ramp up through slow-start as content is downloaded. With a small MSS and/or high latency, each page component will experience a significant slow-start delay.

Transaction Trace Manifestation

The Time Plot can be used to help quantify delays associated with TCP slow-start; graph TCP Frames in Transit. (Since the CWD uses frames as its metric, this is essentially a CWD graph.)

Time Plot View

Time Plot view showing TCP frames in transit, illustrating slow-start as the Congestion Window increases from 2 to 98.

You may also visualize TCP slow-start’s TCP turns using the Bounce Diagram.

The Bounce Diagram can be used to illustrate TCP turns during slow-start.

The Bounce Diagram can be used to illustrate TCP turns during slow-start.

Corrective Actions

If you find that TCP slow-start’s impact on performance is significant, there are a few approaches to mitigating the impact. These include using persistent TCP connections (avoiding frequent slow-starts) and ensuring the largest MSS possible is used (reducing the TCP turns as the congestion window increases). Some appliances – such as load balancers and application delivery controllers – permit configuring the initial CWD value to a larger value, in turn eliminating some TCP turns; this could provide noticeable benefit for high-latency links presuming adequate bandwidth.

Do your browser-based applications reuse TCP connections efficiently? Have you considered mitigating the impact of TCP slow-start by reconfiguring the CWD?

In Part IV, we’ll discuss the performance impact of packet loss, continuing with Congestion Window concepts and completing the bandwidth and congestion discussion we started in Part II. Stay tuned and feel free to comment below.

 

Comments

  1. David Lopes says:

    Really enjoying these series on NPM, really great insight on how to analyze Network issues, a theme that has never been my strongest ability! Keep em coming!

  2. Gary,

    I think the only point I would question is the slow start point. Within my typical environment I see slow start at 1 x mss, not 2. There is a specific setting within AIX that you can set the slow start to 4 x mss, but this is typically not used.
    I too have seen a proxy with an incorrect CWD on the client side, but for me it was significant as the users had a network loop delay of over 300ms. The cwd only increased by 1 packet per round trip, 1,2,3,4,5,6, so the slow start was very linear and it never achieved a transfer rate to fill the receive window size of the client before the object was transfered. If the window size was 64K then it took over 19 seconds to fill the window.
    When we got it fixed the cwd incremented by doubling every round trip, 1,2,4,8,16,32,etc. So it only took just under 3 seconds to fill the 64K receive window.
    When you discuss TCP window and send/receive buffer size are you going to touch on micro bursts – where the amount of data in flight can overflow the buffers in the routers? With Windows 7 and later with auto scaling I think this is a real issue.

  3. Gary Kaiser Gary Kaiser says:

    Hi Chris,
    It’s been many years since I’ve seen (or heard of) an initial CWD of 1 MSS. In most environments, this would seem to me to be problematic due to the common ACK behavior of acknowledging every second packet; a single packet would only be acknowledged after the delayed ACK timer expires, adding perhaps 200 milliseconds at the start of every new TCP connection. Of course the ACK frequency can be changed to 1 (ACK every packet), but because of the default (ACK every 2 packets), the de-facto standard has been to use an initial CWD value of 2. (If you have a trace, I’d be interested in looking at it.)
    Micro-bursts can be especially problematic when aggressive slow-start algorithms are configured, which may be the case with application delivery controllers (ADCs). I’ve seen an initial CWD value of 8 or 10 packets result in very poor performance; packet loss resulting from these initial bursts put most new TCP connections into congestion avoidance mode right from the start.
    Thanks for your comments.

    • Gary,

      I agree that logic dictates that if the server sends only one packet that it should generate a delay Ack, so I need to check, but I can not remember ever seeing a delay Ack in slow start. I am currently away in the country so it will be a week or so until I get back. I am not certain I can send you traces unless I can find one that is not from a customer or employer. I was playing with send/receive buffer sizes to see what impact Network loop delay had and I may have some trace data from that. Worse case I will send you some charts.

      In your “frames in transit” diagram that took about 2.5 seconds to get to a fully open window. Was that due to network loop delay, approx 200ms.

      • Gary Kaiser Gary Kaiser says:

        Hi Chris,
        Yes; the 2.5 seconds was due to a combination of network delay (200 ms) and a small MSS (536B).
        Enjoy your holiday!

  4. Gary,

    I have been over some old traces that I have and I can find both 1 x MSS or 2 x MSS at the start. My issue is that I can not send you the traces as they are of a private network.

    I did look up the RFC and this is what it says –

    “The sender starts by transmitting one segment and waiting for its
    ACK. When that ACK is received, the congestion window is incremented
    from one to two, and two segments can be sent. When each of those
    two segments is acknowledged, the congestion window is increased to
    four.”

    I suspect that TCP uses a different algorithm for slow start ver normal operation or congestion avoidance.

    I do have some charts that I would like to send but I can not put them in here for some reason. Can you send me a note on my mail address c.ctaylor@bigpond.com and I will reply with the charts.

    Thanks

    Chris

    ps. just on the last point – are you saying that the slow start was effected due to the small mss. In that it took longer to reach the recieve window due to a higher number of round trips. If the mss was 1460bytes it would have taken less turns.

  5. Gary Kaiser Gary Kaiser says:

    Hi Chris,
    I recall reading about a long-ago battle between SUN and Microsoft specific to TCP slow-start, where SUN considered Microsoft’s ACK frequency of 2 as a bug (because it introduced the delayed ACK timer at the beginning of every TCP session). I couldn’t find the original article, but did find a clear-enough reference to the issue at http://www.sean.de/Solaris/soltune.html (specifically the entry for tcp_slow_start_initial).
    And yes, a small MSS increases the round-trips required to reach the receiver’s TCP window, since the slow-start algorithm counts packets/segments, not bytes.
    I’ve sent you an email; thanks.

Comments

*


+ one = 5