[Bloat] benefits of ack filtering

Wed Nov 29 11:50:10 EST 2017

Hi Mikael,

> On Nov 29, 2017, at 13:49, Mikael Abrahamsson <swmike at swm.pp.se> wrote:
> 
> On Wed, 29 Nov 2017, Sebastian Moeller wrote:
> 
>> Well, ACK filtering/thinning is a simple trade-off: redundancy versus bandwidth. Since the RFCs say a receiver should acknoledge every second full MSS I think the decision whether to filter or not should be kept to
> 
> Why does it say to do this?

According to RFC 2525:
"2.13.

   Name of Problem
      Stretch ACK violation

Paxson, et. al.              Informational                     [Page 40]

RFC 2525              TCP Implementation Problems             March 1999

   Classification
      Congestion Control/Performance

   Description
      To improve efficiency (both computer and network) a data receiver
      may refrain from sending an ACK for each incoming segment,
      according to [
RFC1122
].  However, an ACK should not be delayed an
      inordinate amount of time.  Specifically, ACKs SHOULD be sent for
      every second full-sized segment that arrives.  If a second full-
      sized segment does not arrive within a given timeout (of no more
      than 0.5 seconds), an ACK should be transmitted, according to
      [
RFC1122
].  A TCP receiver which does not generate an ACK for
      every second full-sized segment exhibits a "Stretch ACK
      Violation".

   Significance
      TCP receivers exhibiting this behavior will cause TCP senders to
      generate burstier traffic, which can degrade performance in
      congested environments.  In addition, generating fewer ACKs
      increases the amount of time needed by the slow start algorithm to
      open the congestion window to an appropriate point, which
      diminishes performance in environments with large bandwidth-delay
      products.  Finally, generating fewer ACKs may cause needless
      retransmission timeouts in lossy environments, as it increases the
      possibility that an entire window of ACKs is lost, forcing a
      retransmission timeout.

   Implications
      When not in loss recovery, every ACK received by a TCP sender
      triggers the transmission of new data segments.  The burst size is
      determined by the number of previously unacknowledged segments
      each ACK covers.  Therefore, a TCP receiver ack'ing more than 2
      segments at a time causes the sending TCP to generate a larger
      burst of traffic upon receipt of the ACK.  This large burst of
      traffic can overwhelm an intervening gateway, leading to higher
      drop rates for both the connection and other connections passing
      through the congested gateway.

      In addition, the TCP slow start algorithm increases the congestion
      window by 1 segment for each ACK received.  Therefore, increasing
      the ACK interval (thus decreasing the rate at which ACKs are
      transmitted) increases the amount of time it takes slow start to
      increase the congestion window to an appropriate operating point,
      and the connection consequently suffers from reduced performance.
      This is especially true for connections using large windows.

   Relevant RFCs

RFC 1122
 outlines delayed ACKs as a recommended mechanism.

Paxson, et. al.              Informational                     [Page 41]

RFC 2525              TCP Implementation Problems             March 1999

   Trace file demonstrating it
      Trace file taken using tcpdump at host B, the data receiver (and
      ACK originator).  The advertised window (which never changed) and
      timestamp options have been omitted for clarity, except for the
      first packet sent by A:

   12:09:24.820187 A.1174 > B.3999: . 2049:3497(1448) ack 1
       win 33580 <nop,nop,timestamp 2249877 2249914> [tos 0x8]
   12:09:24.824147 A.1174 > B.3999: . 3497:4945(1448) ack 1
   12:09:24.832034 A.1174 > B.3999: . 4945:6393(1448) ack 1
   12:09:24.832222 B.3999 > A.1174: . ack 6393
   12:09:24.934837 A.1174 > B.3999: . 6393:7841(1448) ack 1
   12:09:24.942721 A.1174 > B.3999: . 7841:9289(1448) ack 1
   12:09:24.950605 A.1174 > B.3999: . 9289:10737(1448) ack 1
   12:09:24.950797 B.3999 > A.1174: . ack 10737
   12:09:24.958488 A.1174 > B.3999: . 10737:12185(1448) ack 1
   12:09:25.052330 A.1174 > B.3999: . 12185:13633(1448) ack 1
   12:09:25.060216 A.1174 > B.3999: . 13633:15081(1448) ack 1
   12:09:25.060405 B.3999 > A.1174: . ack 15081

      This portion of the trace clearly shows that the receiver (host B)
      sends an ACK for every third full sized packet received.  Further
      investigation of this implementation found that the cause of the
      increased ACK interval was the TCP options being used.  The
      implementation sent an ACK after it was holding 2*MSS worth of
      unacknowledged data.  In the above case, the MSS is 1460 bytes so
      the receiver transmits an ACK after it is holding at least 2920
      bytes of unacknowledged data.  However, the length of the TCP
      options being used [
RFC1323
] took 12 bytes away from the data
      portion of each packet.  This produced packets containing 1448
      bytes of data.  But the additional bytes used by the options in
      the header were not taken into account when determining when to
      trigger an ACK.  Therefore, it took 3 data segments before the
      data receiver was holding enough unacknowledged data (>= 2*MSS, or
      2920 bytes in the above example) to transmit an ACK.

   Trace file demonstrating correct behavior
      Trace file taken using tcpdump at host B, the data receiver (and
      ACK originator), again with window and timestamp information
      omitted except for the first packet:

   12:06:53.627320 A.1172 > B.3999: . 1449:2897(1448) ack 1
       win 33580 <nop,nop,timestamp 2249575 2249612> [tos 0x8]
   12:06:53.634773 A.1172 > B.3999: . 2897:4345(1448) ack 1
   12:06:53.634961 B.3999 > A.1172: . ack 4345
   12:06:53.737326 A.1172 > B.3999: . 4345:5793(1448) ack 1
   12:06:53.744401 A.1172 > B.3999: . 5793:7241(1448) ack 1
   12:06:53.744592 B.3999 > A.1172: . ack 7241

Paxson, et. al.              Informational                     [Page 42]

RFC 2525              TCP Implementation Problems             March 1999

   12:06:53.752287 A.1172 > B.3999: . 7241:8689(1448) ack 1
   12:06:53.847332 A.1172 > B.3999: . 8689:10137(1448) ack 1
   12:06:53.847525 B.3999 > A.1172: . ack 10137

      This trace shows the TCP receiver (host B) ack'ing every second
      full-sized packet, according to [
RFC1122
].  This is the same
      implementation shown above, with slight modifications that allow
      the receiver to take the length of the options into account when
      deciding when to transmit an ACK."

So I guess the point is that at the rates we are discussing (the the according short periods between non-filtered ACKs the time-out issue will be moot). The Slow start issue might also be moot if the sender does more than simple ACK counting. This leaves redundancy... The fact that GRO/GSO effectively lead to ack stretching already the disadvantages might not be as bad today (for high bandwidth flows) than they were in the past...

> What benefit is there to either end system to send 35kPPS of ACKs in order to facilitate a 100 megabyte/s of TCP transfer? 

> 
> Sounds like a lot of useless interrupts and handling by the stack, apart from offloading it to the NIC to do a lot of handling of these mostly useless packets so the CPU doesn't have to do it.
> 
> Why isn't 1kPPS of ACKs sufficient for most usecases?

	This is not going to fly, as far as I can tell the ACK rate needs to be high enough so that its inverse does not exceed the period that is equivalent to the calculated RTO, so the ACK rate needs to scale with the RTT of a connection.

But I do not claim to be an expert here, I just had a look at some RFCs that might or might not be outdated already...

Best Regards
	Sebastian

> 
> -- 
> Mikael Abrahamsson    email: swmike at swm.pp.se