From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mout.gmx.net (mout.gmx.net [212.227.17.21]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "mout.gmx.net", Issuer "TeleSec ServerPass DE-1" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 722A421F748 for ; Thu, 28 Aug 2014 11:59:54 -0700 (PDT) Received: from hms-beagle.home.lan ([93.194.233.219]) by mail.gmx.com (mrgmx103) with ESMTPSA (Nemesis) id 0M24Vr-1WTJyW26qh-00u1YF; Thu, 28 Aug 2014 20:59:51 +0200 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) From: Sebastian Moeller In-Reply-To: <002201cfc2e4$565c1100$03143300$@duckware.com> Date: Thu, 28 Aug 2014 20:59:50 +0200 Content-Transfer-Encoding: quoted-printable Message-Id: References: <000001cfbefe$69194c70$3b4be550$@duckware.com> <000901cfc2c2$c21ae460$4650ad20$@duckware.com> <4A89264B-36C5-4D1F-9E5E-33F2B42C364E@gmail.com> <002201cfc2e4$565c1100$03143300$@duckware.com> To: Jerry Jongerius X-Mailer: Apple Mail (2.1878.6) X-Provags-ID: V03:K0:Kyrp4s4YQK2cMklwalxT/9lzdpujmesQ6c6Tb5SO4sAXhgEzxRq qBRftOV3nX/0vkaJF8k0ZOAaG4/L6n094BBmWgOJeNjddmwbyULO/36twEci+HViiIoX4TG voqszyeLbDOMJAnYhd8IKyvnX4Ltk6idLkMqRSdR6TQW6XHkn+kj1sk2jLppI98azI8gc3E 4sBQxYdBK+UebxBrGGlMA== X-UI-Out-Filterresults: notjunk:1; Cc: bloat@lists.bufferbloat.net Subject: Re: [Bloat] The Dark Problem with AQM in the Internet? X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Aug 2014 18:59:54 -0000 Hi Jerry, On Aug 28, 2014, at 19:20 , Jerry Jongerius wrote: > Jonathan, >=20 > Yes, WireShark shows that *only* one packet gets lost. Regardless of = RWIN > size. The RWIN size can be below the BDP (no measurable queuing = within the > CMTS). Or, the RWIN size can be very large, causing significant = queuing > within the CMTS. With a larger RWIN value, the single dropped packet > typically happens sooner in the download, rather than later. The fact = there > is no "burst loss" is a significant clue. >=20 > The graph is fully explained by the Westwood+ algorithm that the = server is > using. If you input the data observed into the Westwood+ bandwidth > estimator, you end up with the rate seen in the graph after the packet = loss > event. The reason the rate gets limited (no ramp up) is due to = Westwood+ > behavior on a RTO. And the reason there is the RTO is due the = bufferbloat, > and the timing of the lost packet in relation to when the bufferbloat > starts. When there is no RTO, I see the expected drop (to the = Westwood+ > bandwidth estimate) and ramp back up. On a RTO, Westwood+ sets both > ssthresh and cwnd to its bandwidth estimate. >=20 > The PC does SACK, the server does not, so not used. Timestamps off. Okay that is interesting, Could I convince you to try to enable = SACK on the server and test whether you still see the catastrophic = results? And/or try another tcp variant instead of westwood+, like the = default cubic. Best Regards Sebastian >=20 > - Jerry >=20 >=20 > -----Original Message----- > From: Jonathan Morton [mailto:chromatix99@gmail.com]=20 > Sent: Thursday, August 28, 2014 10:08 AM > To: Jerry Jongerius > Cc: 'Greg White'; 'Sebastian Moeller'; bloat@lists.bufferbloat.net > Subject: Re: [Bloat] The Dark Problem with AQM in the Internet? >=20 >=20 > On 28 Aug, 2014, at 4:19 pm, Jerry Jongerius wrote: >=20 >> AQM is a great solution for bufferbloat. End of story. But if you = want > to track down which device in the network intentionally dropped a = packet > (when many devices in the network path will be running AQM), how are = you > going to do that? Or how do you propose to do that? >=20 > We don't plan to do that. Not from the outside. Frankly, we can't = reliably > tell which routers drop packets today, when AQM is not at all widely > deployed, so that's no great loss. >=20 > But if ECN finally gets deployed, AQM can set the Congestion = Experienced > flag instead of dropping packets, most of the time. You still don't = get to > see which router did it, but the packet still gets through and the TCP > session knows what to do about it. >=20 >> The graph presented is caused the interaction of a single dropped = packet, > bufferbloat, and the Westwood+ congestion control algorithm - and not = power > boost. >=20 > This surprises me somewhat - Westwood+ is supposed to be deliberately > tolerant of single packet losses, since it was designed explicitly to = get > around the problem of slight random loss on wireless networks. >=20 > I'd be surprised if, in fact, *only* one packet was lost. The more = usual > case is of "burst loss", where several packets are lost in quick = succession, > and not necessarily consecutive packets. This tends to happen = repeatedly on > dump drop-tail queues, unless the buffer is so large that it = accommodates > the entire receive window (which, for modern OSes, is quite impressive = in a > dark sort of way). Burst loss is characteristic of congestion, = whereas > random loss tends to lose isolated packets, so it would be much less > surprising for Westwood+ to react to it. >=20 > The packets were lost in the first place because the queue became > chock-full, probably at just about the exact moment when the = PowerBoost > allowance ran out and the bandwidth came down (which tends to cause = the > buffer to fill rapidly), so you get the worst-case scenario: the = buffer at > its fullest, and the bandwidth draining it at its minimum. This = maximises > the time before your TCP gets to even notice the lost packet's = nonexistence, > during which the sender keeps the buffer full because it still thinks > everything's fine. >=20 > What is probably happening is that the bottleneck queue, being so = large, > delays the retransmission of the lost packet until the Retransmit = Timer > expires. This will cause Reno-family TCPs to revert to slow-start, = assuming > (rightly in this case) that the characteristics of the channel have = changed. > You can see that it takes most of the first second for the sender to = ramp up > to full speed, and nearly as long to ramp back up to the reduced = speed, both > of which are characteristic of slow-start at WAN latencies. NB: = during > slow-start, the buffer remains empty as long as the incoming data rate = is > less than the output capacity, so latency is at a minimum. >=20 > Do you have TCP SACK and timestamps turned on? Those usually allow = minor > losses like that to be handled more gracefully - the sending TCP gets = a > better idea of the RTT (allowing it to set the Retransmit Timer more > intelligently), and would be able to see that progress is still being = made > with the backlog of buffered packets, even though the core TCP ACK is = not > advancing. In the event of burst loss, it would also be able to = retransmit > the correct set of packets straight away. >=20 > What AQM would do for you here - if your ISP implemented it properly - = is to > eliminate the negative effects of filling that massive buffer at your = ISP. > It would allow the sending TCP to detect and recover from any packet = loss > more quickly, and with ECN turned on you probably wouldn't even get = any > packet loss. >=20 > What's also interesting is that, after recovering from the change in > bandwidth, you get smaller bursts of about 15-40KB arriving at roughly > half-second intervals, mixed in with the relatively steady 1-, 2- and > 3-packet stream. That is characteristic of low-level packet loss with = a > low-latency recovery. >=20 > This either implies that your ISP has stuck you on a much shorter = buffer for > the lower-bandwidth (non-PowerBoost) regime, *or* that the sender is > enforcing a smaller congestion window on you after having suffered a > slow-start recovery. The latter restricts your bandwidth to match the > delay-bandwidth product, but happily the "delay" in that equation is = at a > minimum if it keeps your buffer empty. >=20 > And frankly, you're still getting 45Mbps under those conditions. Many > people would kill for that sort of performance - although they'd = probably > then want to kill everyone in the Comcast call centre later on. >=20 > - Jonathan Morton >=20 > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat