From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <bcronce@gmail.com>
Received: from mail-lf0-x229.google.com (mail-lf0-x229.google.com
 [IPv6:2a00:1450:4010:c07::229])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id A8A453B2A4
 for <bloat@lists.bufferbloat.net>; Tue, 12 Dec 2017 14:27:58 -0500 (EST)
Received: by mail-lf0-x229.google.com with SMTP id f20so24488856lfe.3
 for <bloat@lists.bufferbloat.net>; Tue, 12 Dec 2017 11:27:58 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc; bh=64nDw0p5AW8bwOkpF4qp4mI6+Yedz90kcli8g+ONp10=;
 b=lZJq9GBcJNfoKR23CrNHeGsMQCe77IIx34OJi8ZkrQYQWcG34kE6BpsTUbPlmiYe8S
 srNprz/3FiaB9OMjoKqcP7V0KqlIp9qjio+AUEkI63GuEgqRH3QXkh4lbzBinja0AZm5
 9TaSzM6N6OnfObPE2MRc7fVK0HGaHi7hBQ8jneoXAR3bIjS4Nrd7sVLHOTCpYJJ4N2v9
 RJgIb8l4+n7373aCosB5eHnIL5UyTP6aAQ/aPDRSwbcAeDhNuAdS+UtdA5zYxZAw17t2
 rU6lDliQIiMr0ERf7WVC/gMdr6MPIgJmbYdPIwf/pCj2WA2Hs5g85GGul7p5EEJkmvt3
 6Wog==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc;
 bh=64nDw0p5AW8bwOkpF4qp4mI6+Yedz90kcli8g+ONp10=;
 b=pf8zZF3g90NghDvIZSjaijgvBmEjah+TmoIz1vuprnzVhKEqhpAm6Fxb62B/cYO3pB
 qJ9Y93+cwiXq4/+9RUosIAyK/VUQttW6wASrJECU5dZHOTSJ7XMvtyioSCT2QVhUwcsz
 mQ0FVWJS5a2mUnqua3K1v+vnIADm9xXqR88/sf/gZ8dWCZA/jGIGW9F4DMl3U/4z5+lG
 Suciok71sLWd1ut6q3R8Z4PkyhpYFeVZ/XpN9uLHYhhal50N8sF7eT1BXybjCkZ5GhSL
 xwM6LvjWxHNQfee9ICJNjePZUbe1+h6EPtsdkNB1Wc9BujEqS68joM79sm1nfWWHRhJ3
 AjnA==
X-Gm-Message-State: AKGB3mKKQ+1HxGK3559b+bsUYZyaIyw4L1hSL4OlH/r+1+j/xUB5z6vR
 nFLnm3xIGf+cPzgK73qr2jjx4QKV8zBu9WQzQTA=
X-Google-Smtp-Source: ACJfBotXgzByCLgmznHrO6J3jSH43OxZ2Z8KcFrw+9PlgHrwWWn3iHl/xje+fyvunda5Bn/2AOgBVwREFQWvVUgcLok=
X-Received: by 10.25.125.6 with SMTP id y6mr2187563lfc.128.1513106877252; Tue,
 12 Dec 2017 11:27:57 -0800 (PST)
MIME-Version: 1.0
Received: by 10.25.17.202 with HTTP; Tue, 12 Dec 2017 11:27:55 -0800 (PST)
In-Reply-To: <019064B3-835C-4D59-BE52-9E86EE08CD02@gmx.de>
References: <CAA93jw47ZAXAJmiOVCb2NR3JRcCfmX0TLr+55O0G+J6HCW5bVQ@mail.gmail.com>
 <alpine.DEB.2.20.1711290655590.32099@uplift.swm.pp.se>
 <4D0E907C-E15D-437C-B6F7-FF348346D615@gmx.de>
 <alpine.DEB.2.20.1711291347050.32099@uplift.swm.pp.se>
 <019064B3-835C-4D59-BE52-9E86EE08CD02@gmx.de>
From: Benjamin Cronce <bcronce@gmail.com>
Date: Tue, 12 Dec 2017 13:27:55 -0600
Message-ID: <CAJ_ENFEythcsAvABsOi9pry3S9aVzp1emcycLY0_L7oqAX_erA@mail.gmail.com>
To: Sebastian Moeller <moeller0@gmx.de>
Cc: Mikael Abrahamsson <swmike@swm.pp.se>, bloat <bloat@lists.bufferbloat.net>
Content-Type: multipart/alternative; boundary="001a114a6038ed09f8056029a561"
Subject: Re: [Bloat] benefits of ack filtering
X-BeenThere: bloat@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: General list for discussing Bufferbloat <bloat.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/bloat>,
 <mailto:bloat-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/bloat>
List-Post: <mailto:bloat@lists.bufferbloat.net>
List-Help: <mailto:bloat-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/bloat>,
 <mailto:bloat-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Tue, 12 Dec 2017 19:27:59 -0000

--001a114a6038ed09f8056029a561
Content-Type: text/plain; charset="UTF-8"

On Wed, Nov 29, 2017 at 10:50 AM, Sebastian Moeller <moeller0@gmx.de> wrote:

> Hi Mikael,
>
>
> > On Nov 29, 2017, at 13:49, Mikael Abrahamsson <swmike@swm.pp.se> wrote:
> >
> > On Wed, 29 Nov 2017, Sebastian Moeller wrote:
> >
> >> Well, ACK filtering/thinning is a simple trade-off: redundancy versus
> bandwidth. Since the RFCs say a receiver should acknoledge every second
> full MSS I think the decision whether to filter or not should be kept to
> >
> > Why does it say to do this?
>
> According to RFC 2525:
> "2.13.
>
>    Name of Problem
>       Stretch ACK violation
>
>
>
>
> Paxson, et. al.              Informational                     [Page 40]
>
> RFC 2525              TCP Implementation Problems             March 1999
>
>
>
>    Classification
>       Congestion Control/Performance
>
>    Description
>       To improve efficiency (both computer and network) a data receiver
>       may refrain from sending an ACK for each incoming segment,
>       according to [
> RFC1122
> ].  However, an ACK should not be delayed an
>       inordinate amount of time.  Specifically, ACKs SHOULD be sent for
>       every second full-sized segment that arrives.  If a second full-
>       sized segment does not arrive within a given timeout (of no more
>       than 0.5 seconds), an ACK should be transmitted, according to
>       [
> RFC1122
> ].  A TCP receiver which does not generate an ACK for
>       every second full-sized segment exhibits a "Stretch ACK
>       Violation".
>
>    Significance
>       TCP receivers exhibiting this behavior will cause TCP senders to
>       generate burstier traffic, which can degrade performance in
>       congested environments.  In addition, generating fewer ACKs
>       increases the amount of time needed by the slow start algorithm to
>       open the congestion window to an appropriate point, which
>       diminishes performance in environments with large bandwidth-delay
>       products.  Finally, generating fewer ACKs may cause needless
>       retransmission timeouts in lossy environments, as it increases the
>       possibility that an entire window of ACKs is lost, forcing a
>       retransmission timeout.
>

It is interesting that enough of an issue occurred for them to explicitly
state that at least 1 ACK per 2 segments as an RFC. That being said, all
rules are meant to be broken, but not taken lightly when breaking. In
highly asymmetric connections with large bufferbloat, the sender is either
theoretically or practically of sending ACKs fast enough due to lack of
bandwidth, results in ACKs becoming highly delayed, which, in my opinion,
is worse. If the recover cannot ACK the receiver data within ~1.5 seconds,
the sender will resend the missing segments. In my experience, I have seen
upwards of 50% dup packet rates even though the actual loss rate was less
than 1%.

I do not feel that thinning ACKs gains much for any healthy ratio of
down:up. The overhead of those "wasteful" ACKs are on par with the overhead
of IP+TCP headers. Anything that can disturb the health of the Internet
should make strong measures to prevent the end user from configuring the
shaper in a knowingly destructive way. Like possibly letting the end user
configure the amount of bandwidth ACKs get. I see many saying 35k pps is
ridiculous, but that's pittance. If someone's network can't handle that,
maybe they need a special TCP proxy. Thinning ACKs to help with bufferbloat
is one thing, thinning ACKs because we feel TCP is too aggressive, is a can
of worms. Research on the topic is still appreciated, but we should be
careful about how much functionality Cake will have.


>
>    Implications
>       When not in loss recovery, every ACK received by a TCP sender
>       triggers the transmission of new data segments.  The burst size is
>       determined by the number of previously unacknowledged segments
>       each ACK covers.  Therefore, a TCP receiver ack'ing more than 2
>       segments at a time causes the sending TCP to generate a larger
>       burst of traffic upon receipt of the ACK.  This large burst of
>       traffic can overwhelm an intervening gateway, leading to higher
>       drop rates for both the connection and other connections passing
>       through the congested gateway.
>
>       In addition, the TCP slow start algorithm increases the congestion
>       window by 1 segment for each ACK received.  Therefore, increasing
>       the ACK interval (thus decreasing the rate at which ACKs are
>       transmitted) increases the amount of time it takes slow start to
>       increase the congestion window to an appropriate operating point,
>       and the connection consequently suffers from reduced performance.
>       This is especially true for connections using large windows.
>
>    Relevant RFCs
>
> RFC 1122
>  outlines delayed ACKs as a recommended mechanism.
>
>
>
>
> Paxson, et. al.              Informational                     [Page 41]
>
> RFC 2525              TCP Implementation Problems             March 1999
>
>
>
>    Trace file demonstrating it
>       Trace file taken using tcpdump at host B, the data receiver (and
>       ACK originator).  The advertised window (which never changed) and
>       timestamp options have been omitted for clarity, except for the
>       first packet sent by A:
>
>    12:09:24.820187 A.1174 > B.3999: . 2049:3497(1448) ack 1
>        win 33580 <nop,nop,timestamp 2249877 2249914> [tos 0x8]
>    12:09:24.824147 A.1174 > B.3999: . 3497:4945(1448) ack 1
>    12:09:24.832034 A.1174 > B.3999: . 4945:6393(1448) ack 1
>    12:09:24.832222 B.3999 > A.1174: . ack 6393
>    12:09:24.934837 A.1174 > B.3999: . 6393:7841(1448) ack 1
>    12:09:24.942721 A.1174 > B.3999: . 7841:9289(1448) ack 1
>    12:09:24.950605 A.1174 > B.3999: . 9289:10737(1448) ack 1
>    12:09:24.950797 B.3999 > A.1174: . ack 10737
>    12:09:24.958488 A.1174 > B.3999: . 10737:12185(1448) ack 1
>    12:09:25.052330 A.1174 > B.3999: . 12185:13633(1448) ack 1
>    12:09:25.060216 A.1174 > B.3999: . 13633:15081(1448) ack 1
>    12:09:25.060405 B.3999 > A.1174: . ack 15081
>
>       This portion of the trace clearly shows that the receiver (host B)
>       sends an ACK for every third full sized packet received.  Further
>       investigation of this implementation found that the cause of the
>       increased ACK interval was the TCP options being used.  The
>       implementation sent an ACK after it was holding 2*MSS worth of
>       unacknowledged data.  In the above case, the MSS is 1460 bytes so
>       the receiver transmits an ACK after it is holding at least 2920
>       bytes of unacknowledged data.  However, the length of the TCP
>       options being used [
> RFC1323
> ] took 12 bytes away from the data
>       portion of each packet.  This produced packets containing 1448
>       bytes of data.  But the additional bytes used by the options in
>       the header were not taken into account when determining when to
>       trigger an ACK.  Therefore, it took 3 data segments before the
>       data receiver was holding enough unacknowledged data (>= 2*MSS, or
>       2920 bytes in the above example) to transmit an ACK.
>
>    Trace file demonstrating correct behavior
>       Trace file taken using tcpdump at host B, the data receiver (and
>       ACK originator), again with window and timestamp information
>       omitted except for the first packet:
>
>    12:06:53.627320 A.1172 > B.3999: . 1449:2897(1448) ack 1
>        win 33580 <nop,nop,timestamp 2249575 2249612> [tos 0x8]
>    12:06:53.634773 A.1172 > B.3999: . 2897:4345(1448) ack 1
>    12:06:53.634961 B.3999 > A.1172: . ack 4345
>    12:06:53.737326 A.1172 > B.3999: . 4345:5793(1448) ack 1
>    12:06:53.744401 A.1172 > B.3999: . 5793:7241(1448) ack 1
>    12:06:53.744592 B.3999 > A.1172: . ack 7241
>
>
>
>
> Paxson, et. al.              Informational                     [Page 42]
>
> RFC 2525              TCP Implementation Problems             March 1999
>
>
>
>    12:06:53.752287 A.1172 > B.3999: . 7241:8689(1448) ack 1
>    12:06:53.847332 A.1172 > B.3999: . 8689:10137(1448) ack 1
>    12:06:53.847525 B.3999 > A.1172: . ack 10137
>
>       This trace shows the TCP receiver (host B) ack'ing every second
>       full-sized packet, according to [
> RFC1122
> ].  This is the same
>       implementation shown above, with slight modifications that allow
>       the receiver to take the length of the options into account when
>       deciding when to transmit an ACK."
>
> So I guess the point is that at the rates we are discussing (the the
> according short periods between non-filtered ACKs the time-out issue will
> be moot). The Slow start issue might also be moot if the sender does more
> than simple ACK counting. This leaves redundancy... The fact that GRO/GSO
> effectively lead to ack stretching already the disadvantages might not be
> as bad today (for high bandwidth flows) than they were in the past...
>
>
> > What benefit is there to either end system to send 35kPPS of ACKs in
> order to facilitate a 100 megabyte/s of TCP transfer?
>
> >
> > Sounds like a lot of useless interrupts and handling by the stack, apart
> from offloading it to the NIC to do a lot of handling of these mostly
> useless packets so the CPU doesn't have to do it.
> >
> > Why isn't 1kPPS of ACKs sufficient for most usecases?
>
>         This is not going to fly, as far as I can tell the ACK rate needs
> to be high enough so that its inverse does not exceed the period that is
> equivalent to the calculated RTO, so the ACK rate needs to scale with the
> RTT of a connection.
>
> But I do not claim to be an expert here, I just had a look at some RFCs
> that might or might not be outdated already...
>
> Best Regards
>         Sebastian
>
>
> >
> > --
> > Mikael Abrahamsson    email: swmike@swm.pp.se
>
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
>

--001a114a6038ed09f8056029a561
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><br><div class=3D"gmail_extra"><br><div class=3D"gmail_quo=
te">On Wed, Nov 29, 2017 at 10:50 AM, Sebastian Moeller <span dir=3D"ltr">&=
lt;<a href=3D"mailto:moeller0@gmx.de" target=3D"_blank">moeller0@gmx.de</a>=
&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0px=
 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi =
Mikael,<br>
<span><br>
<br>
&gt; On Nov 29, 2017, at 13:49, Mikael Abrahamsson &lt;<a href=3D"mailto:sw=
mike@swm.pp.se" target=3D"_blank">swmike@swm.pp.se</a>&gt; wrote:<br>
&gt;<br>
&gt; On Wed, 29 Nov 2017, Sebastian Moeller wrote:<br>
&gt;<br>
&gt;&gt; Well, ACK filtering/thinning is a simple trade-off: redundancy ver=
sus bandwidth. Since the RFCs say a receiver should acknoledge every second=
 full MSS I think the decision whether to filter or not should be kept to<b=
r>
&gt;<br>
&gt; Why does it say to do this?<br>
<br>
</span>According to RFC 2525:<br>
&quot;2.13.<br>
<br>
=C2=A0 =C2=A0Name of Problem<br>
=C2=A0 =C2=A0 =C2=A0 Stretch ACK violation<br>
<br>
<br>
<br>
<br>
Paxson, et. al.=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Information=
al=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0[Page 40]<br>
<br>
RFC 2525=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 TCP Implementation=
 Problems=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0March 1999<br>
<br>
<br>
<br>
=C2=A0 =C2=A0Classification<br>
=C2=A0 =C2=A0 =C2=A0 Congestion Control/Performance<br>
<br>
=C2=A0 =C2=A0Description<br>
=C2=A0 =C2=A0 =C2=A0 To improve efficiency (both computer and network) a da=
ta receiver<br>
=C2=A0 =C2=A0 =C2=A0 may refrain from sending an ACK for each incoming segm=
ent,<br>
=C2=A0 =C2=A0 =C2=A0 according to [<br>
RFC1122<br>
].=C2=A0 However, an ACK should not be delayed an<br>
=C2=A0 =C2=A0 =C2=A0 inordinate amount of time.=C2=A0 Specifically, ACKs SH=
OULD be sent for<br>
=C2=A0 =C2=A0 =C2=A0 every second full-sized segment that arrives.=C2=A0 If=
 a second full-<br>
=C2=A0 =C2=A0 =C2=A0 sized segment does not arrive within a given timeout (=
of no more<br>
=C2=A0 =C2=A0 =C2=A0 than 0.5 seconds), an ACK should be transmitted, accor=
ding to<br>
=C2=A0 =C2=A0 =C2=A0 [<br>
RFC1122<br>
].=C2=A0 A TCP receiver which does not generate an ACK for<br>
=C2=A0 =C2=A0 =C2=A0 every second full-sized segment exhibits a &quot;Stret=
ch ACK<br>
=C2=A0 =C2=A0 =C2=A0 Violation&quot;.<br>
<br>
=C2=A0 =C2=A0Significance<br>
=C2=A0 =C2=A0 =C2=A0 TCP receivers exhibiting this behavior will cause TCP =
senders to<br>
=C2=A0 =C2=A0 =C2=A0 generate burstier traffic, which can degrade performan=
ce in<br>
=C2=A0 =C2=A0 =C2=A0 congested environments.=C2=A0 In addition, generating =
fewer ACKs<br>
=C2=A0 =C2=A0 =C2=A0 increases the amount of time needed by the slow start =
algorithm to<br>
=C2=A0 =C2=A0 =C2=A0 open the congestion window to an appropriate point, wh=
ich<br>
=C2=A0 =C2=A0 =C2=A0 diminishes performance in environments with large band=
width-delay<br>
=C2=A0 =C2=A0 =C2=A0 products.=C2=A0 Finally, generating fewer ACKs may cau=
se needless<br>
=C2=A0 =C2=A0 =C2=A0 retransmission timeouts in lossy environments, as it i=
ncreases the<br>
=C2=A0 =C2=A0 =C2=A0 possibility that an entire window of ACKs is lost, for=
cing a<br>
=C2=A0 =C2=A0 =C2=A0 retransmission timeout.<br></blockquote><div>=C2=A0</d=
iv><div>It is interesting that enough of an issue occurred for them to expl=
icitly state that at least 1 ACK per 2 segments as an RFC. That being said,=
 all rules are meant to be broken, but not taken lightly when breaking. In =
highly asymmetric connections with large bufferbloat, the sender is either =
theoretically or practically of sending ACKs fast enough due to lack of ban=
dwidth, results in ACKs becoming highly delayed, which, in my opinion, is w=
orse. If the recover cannot ACK the receiver data within ~1.5 seconds, the =
sender will resend the missing segments. In my experience, I have seen upwa=
rds of 50% dup packet rates even though the actual loss rate was less than =
1%.</div><div><br></div><div>I do not feel that thinning ACKs gains much fo=
r any healthy ratio of down:up. The overhead of those &quot;wasteful&quot; =
ACKs are on par with the overhead of IP+TCP headers. Anything that can dist=
urb the health of the Internet should make strong measures to prevent the e=
nd user from configuring the shaper in a knowingly destructive way. Like po=
ssibly letting the end user configure the amount of bandwidth ACKs get. I s=
ee many saying 35k pps is ridiculous, but that&#39;s=C2=A0pittance. If some=
one&#39;s network can&#39;t handle that, maybe they need a special TCP prox=
y. Thinning ACKs to help with bufferbloat is one thing, thinning ACKs becau=
se we feel TCP is too aggressive, is a can of worms. Research on the topic =
is still appreciated, but we should be careful about how much functionality=
 Cake will have.</div><div>=C2=A0</div><blockquote class=3D"gmail_quote" st=
yle=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padd=
ing-left:1ex">
<br>
=C2=A0 =C2=A0Implications<br>
=C2=A0 =C2=A0 =C2=A0 When not in loss recovery, every ACK received by a TCP=
 sender<br>
=C2=A0 =C2=A0 =C2=A0 triggers the transmission of new data segments.=C2=A0 =
The burst size is<br>
=C2=A0 =C2=A0 =C2=A0 determined by the number of previously unacknowledged =
segments<br>
=C2=A0 =C2=A0 =C2=A0 each ACK covers.=C2=A0 Therefore, a TCP receiver ack&#=
39;ing more than 2<br>
=C2=A0 =C2=A0 =C2=A0 segments at a time causes the sending TCP to generate =
a larger<br>
=C2=A0 =C2=A0 =C2=A0 burst of traffic upon receipt of the ACK.=C2=A0 This l=
arge burst of<br>
=C2=A0 =C2=A0 =C2=A0 traffic can overwhelm an intervening gateway, leading =
to higher<br>
=C2=A0 =C2=A0 =C2=A0 drop rates for both the connection and other connectio=
ns passing<br>
=C2=A0 =C2=A0 =C2=A0 through the congested gateway.<br>
<br>
=C2=A0 =C2=A0 =C2=A0 In addition, the TCP slow start algorithm increases th=
e congestion<br>
=C2=A0 =C2=A0 =C2=A0 window by 1 segment for each ACK received.=C2=A0 There=
fore, increasing<br>
=C2=A0 =C2=A0 =C2=A0 the ACK interval (thus decreasing the rate at which AC=
Ks are<br>
=C2=A0 =C2=A0 =C2=A0 transmitted) increases the amount of time it takes slo=
w start to<br>
=C2=A0 =C2=A0 =C2=A0 increase the congestion window to an appropriate opera=
ting point,<br>
=C2=A0 =C2=A0 =C2=A0 and the connection consequently suffers from reduced p=
erformance.<br>
=C2=A0 =C2=A0 =C2=A0 This is especially true for connections using large wi=
ndows.<br>
<br>
=C2=A0 =C2=A0Relevant RFCs<br>
<br>
RFC 1122<br>
=C2=A0outlines delayed ACKs as a recommended mechanism.<br>
<br>
<br>
<br>
<br>
Paxson, et. al.=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Information=
al=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0[Page 41]<br>
<br>
RFC 2525=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 TCP Implementation=
 Problems=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0March 1999<br>
<br>
<br>
<br>
=C2=A0 =C2=A0Trace file demonstrating it<br>
=C2=A0 =C2=A0 =C2=A0 Trace file taken using tcpdump at host B, the data rec=
eiver (and<br>
=C2=A0 =C2=A0 =C2=A0 ACK originator).=C2=A0 The advertised window (which ne=
ver changed) and<br>
=C2=A0 =C2=A0 =C2=A0 timestamp options have been omitted for clarity, excep=
t for the<br>
=C2=A0 =C2=A0 =C2=A0 first packet sent by A:<br>
<br>
=C2=A0 =C2=A012:09:24.820187 A.1174 &gt; B.3999: . 2049:3497(1448) ack 1<br=
>
=C2=A0 =C2=A0 =C2=A0 =C2=A0win 33580 &lt;nop,nop,timestamp 2249877 2249914&=
gt; [tos 0x8]<br>
=C2=A0 =C2=A012:09:24.824147 A.1174 &gt; B.3999: . 3497:4945(1448) ack 1<br=
>
=C2=A0 =C2=A012:09:24.832034 A.1174 &gt; B.3999: . 4945:6393(1448) ack 1<br=
>
=C2=A0 =C2=A012:09:24.832222 B.3999 &gt; A.1174: . ack 6393<br>
=C2=A0 =C2=A012:09:24.934837 A.1174 &gt; B.3999: . 6393:7841(1448) ack 1<br=
>
=C2=A0 =C2=A012:09:24.942721 A.1174 &gt; B.3999: . 7841:9289(1448) ack 1<br=
>
=C2=A0 =C2=A012:09:24.950605 A.1174 &gt; B.3999: . 9289:10737(1448) ack 1<b=
r>
=C2=A0 =C2=A012:09:24.950797 B.3999 &gt; A.1174: . ack 10737<br>
=C2=A0 =C2=A012:09:24.958488 A.1174 &gt; B.3999: . 10737:12185(1448) ack 1<=
br>
=C2=A0 =C2=A012:09:25.052330 A.1174 &gt; B.3999: . 12185:13633(1448) ack 1<=
br>
=C2=A0 =C2=A012:09:25.060216 A.1174 &gt; B.3999: . 13633:15081(1448) ack 1<=
br>
=C2=A0 =C2=A012:09:25.060405 B.3999 &gt; A.1174: . ack 15081<br>
<br>
=C2=A0 =C2=A0 =C2=A0 This portion of the trace clearly shows that the recei=
ver (host B)<br>
=C2=A0 =C2=A0 =C2=A0 sends an ACK for every third full sized packet receive=
d.=C2=A0 Further<br>
=C2=A0 =C2=A0 =C2=A0 investigation of this implementation found that the ca=
use of the<br>
=C2=A0 =C2=A0 =C2=A0 increased ACK interval was the TCP options being used.=
=C2=A0 The<br>
=C2=A0 =C2=A0 =C2=A0 implementation sent an ACK after it was holding 2*MSS =
worth of<br>
=C2=A0 =C2=A0 =C2=A0 unacknowledged data.=C2=A0 In the above case, the MSS =
is 1460 bytes so<br>
=C2=A0 =C2=A0 =C2=A0 the receiver transmits an ACK after it is holding at l=
east 2920<br>
=C2=A0 =C2=A0 =C2=A0 bytes of unacknowledged data.=C2=A0 However, the lengt=
h of the TCP<br>
=C2=A0 =C2=A0 =C2=A0 options being used [<br>
RFC1323<br>
] took 12 bytes away from the data<br>
=C2=A0 =C2=A0 =C2=A0 portion of each packet.=C2=A0 This produced packets co=
ntaining 1448<br>
=C2=A0 =C2=A0 =C2=A0 bytes of data.=C2=A0 But the additional bytes used by =
the options in<br>
=C2=A0 =C2=A0 =C2=A0 the header were not taken into account when determinin=
g when to<br>
=C2=A0 =C2=A0 =C2=A0 trigger an ACK.=C2=A0 Therefore, it took 3 data segmen=
ts before the<br>
=C2=A0 =C2=A0 =C2=A0 data receiver was holding enough unacknowledged data (=
&gt;=3D 2*MSS, or<br>
=C2=A0 =C2=A0 =C2=A0 2920 bytes in the above example) to transmit an ACK.<b=
r>
<br>
=C2=A0 =C2=A0Trace file demonstrating correct behavior<br>
=C2=A0 =C2=A0 =C2=A0 Trace file taken using tcpdump at host B, the data rec=
eiver (and<br>
=C2=A0 =C2=A0 =C2=A0 ACK originator), again with window and timestamp infor=
mation<br>
=C2=A0 =C2=A0 =C2=A0 omitted except for the first packet:<br>
<br>
=C2=A0 =C2=A012:06:53.627320 A.1172 &gt; B.3999: . 1449:2897(1448) ack 1<br=
>
=C2=A0 =C2=A0 =C2=A0 =C2=A0win 33580 &lt;nop,nop,timestamp 2249575 2249612&=
gt; [tos 0x8]<br>
=C2=A0 =C2=A012:06:53.634773 A.1172 &gt; B.3999: . 2897:4345(1448) ack 1<br=
>
=C2=A0 =C2=A012:06:53.634961 B.3999 &gt; A.1172: . ack 4345<br>
=C2=A0 =C2=A012:06:53.737326 A.1172 &gt; B.3999: . 4345:5793(1448) ack 1<br=
>
=C2=A0 =C2=A012:06:53.744401 A.1172 &gt; B.3999: . 5793:7241(1448) ack 1<br=
>
=C2=A0 =C2=A012:06:53.744592 B.3999 &gt; A.1172: . ack 7241<br>
<br>
<br>
<br>
<br>
Paxson, et. al.=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Information=
al=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0[Page 42]<br>
<br>
RFC 2525=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 TCP Implementation=
 Problems=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0March 1999<br>
<br>
<br>
<br>
=C2=A0 =C2=A012:06:53.752287 A.1172 &gt; B.3999: . 7241:8689(1448) ack 1<br=
>
=C2=A0 =C2=A012:06:53.847332 A.1172 &gt; B.3999: . 8689:10137(1448) ack 1<b=
r>
=C2=A0 =C2=A012:06:53.847525 B.3999 &gt; A.1172: . ack 10137<br>
<br>
=C2=A0 =C2=A0 =C2=A0 This trace shows the TCP receiver (host B) ack&#39;ing=
 every second<br>
=C2=A0 =C2=A0 =C2=A0 full-sized packet, according to [<br>
RFC1122<br>
].=C2=A0 This is the same<br>
=C2=A0 =C2=A0 =C2=A0 implementation shown above, with slight modifications =
that allow<br>
=C2=A0 =C2=A0 =C2=A0 the receiver to take the length of the options into ac=
count when<br>
=C2=A0 =C2=A0 =C2=A0 deciding when to transmit an ACK.&quot;<br>
<br>
So I guess the point is that at the rates we are discussing (the the accord=
ing short periods between non-filtered ACKs the time-out issue will be moot=
). The Slow start issue might also be moot if the sender does more than sim=
ple ACK counting. This leaves redundancy... The fact that GRO/GSO effective=
ly lead to ack stretching already the disadvantages might not be as bad tod=
ay (for high bandwidth flows) than they were in the past...<br>
<span><br>
<br>
&gt; What benefit is there to either end system to send 35kPPS of ACKs in o=
rder to facilitate a 100 megabyte/s of TCP transfer?<br>
<br>
&gt;<br>
&gt; Sounds like a lot of useless interrupts and handling by the stack, apa=
rt from offloading it to the NIC to do a lot of handling of these mostly us=
eless packets so the CPU doesn&#39;t have to do it.<br>
&gt;<br>
&gt; Why isn&#39;t 1kPPS of ACKs sufficient for most usecases?<br>
<br>
</span>=C2=A0 =C2=A0 =C2=A0 =C2=A0 This is not going to fly, as far as I ca=
n tell the ACK rate needs to be high enough so that its inverse does not ex=
ceed the period that is equivalent to the calculated RTO, so the ACK rate n=
eeds to scale with the RTT of a connection.<br>
<br>
But I do not claim to be an expert here, I just had a look at some RFCs tha=
t might or might not be outdated already...<br>
<br>
Best Regards<br>
<span class=3D"gmail-m_-3196338457412041002HOEnZb"><font color=3D"#888888">=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Sebastian<br>
</font></span><div class=3D"gmail-m_-3196338457412041002HOEnZb"><div class=
=3D"gmail-m_-3196338457412041002h5"><br>
<br>
&gt;<br>
&gt; --<br>
&gt; Mikael Abrahamsson=C2=A0 =C2=A0 email: <a href=3D"mailto:swmike@swm.pp=
.se" target=3D"_blank">swmike@swm.pp.se</a><br>
<br>
______________________________<wbr>_________________<br>
Bloat mailing list<br>
<a href=3D"mailto:Bloat@lists.bufferbloat.net" target=3D"_blank">Bloat@list=
s.bufferbloat.net</a><br>
<a href=3D"https://lists.bufferbloat.net/listinfo/bloat" rel=3D"noreferrer"=
 target=3D"_blank">https://lists.bufferbloat.net/<wbr>listinfo/bloat</a><br=
>
</div></div></blockquote></div><br></div></div>

--001a114a6038ed09f8056029a561--