From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <moeller0@gmx.de>
Received: from mout.gmx.net (mout.gmx.net [212.227.17.21])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client CN "mout.gmx.net",
	Issuer "TeleSec ServerPass DE-1" (verified OK))
	by huchra.bufferbloat.net (Postfix) with ESMTPS id 722A421F748
	for <bloat@lists.bufferbloat.net>; Thu, 28 Aug 2014 11:59:54 -0700 (PDT)
Received: from hms-beagle.home.lan ([93.194.233.219]) by mail.gmx.com
	(mrgmx103) with ESMTPSA (Nemesis) id 0M24Vr-1WTJyW26qh-00u1YF;
	Thu, 28 Aug 2014 20:59:51 +0200
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\))
From: Sebastian Moeller <moeller0@gmx.de>
In-Reply-To: <002201cfc2e4$565c1100$03143300$@duckware.com>
Date: Thu, 28 Aug 2014 20:59:50 +0200
Content-Transfer-Encoding: quoted-printable
Message-Id: <F28BED17-B9C1-46DC-819C-133F3BB45E3B@gmx.de>
References: <000001cfbefe$69194c70$3b4be550$@duckware.com>
	<D542A271-BFFF-4494-8EE9-CBC9BFEB09EE@gmx.de>
	<D020C902.3BF0A%g.white@cablelabs.com>
	<000901cfc2c2$c21ae460$4650ad20$@duckware.com>
	<4A89264B-36C5-4D1F-9E5E-33F2B42C364E@gmail.com>
	<002201cfc2e4$565c1100$03143300$@duckware.com>
To: Jerry Jongerius <jerryj@duckware.com>
X-Mailer: Apple Mail (2.1878.6)
X-Provags-ID: V03:K0:Kyrp4s4YQK2cMklwalxT/9lzdpujmesQ6c6Tb5SO4sAXhgEzxRq
	qBRftOV3nX/0vkaJF8k0ZOAaG4/L6n094BBmWgOJeNjddmwbyULO/36twEci+HViiIoX4TG
	voqszyeLbDOMJAnYhd8IKyvnX4Ltk6idLkMqRSdR6TQW6XHkn+kj1sk2jLppI98azI8gc3E
	4sBQxYdBK+UebxBrGGlMA==
X-UI-Out-Filterresults: notjunk:1;
Cc: bloat@lists.bufferbloat.net
Subject: Re: [Bloat] The Dark Problem with AQM in the Internet?
X-BeenThere: bloat@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: General list for discussing Bufferbloat <bloat.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/bloat>,
	<mailto:bloat-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/bloat>
List-Post: <mailto:bloat@lists.bufferbloat.net>
List-Help: <mailto:bloat-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/bloat>,
	<mailto:bloat-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Thu, 28 Aug 2014 18:59:54 -0000

Hi Jerry,


On Aug 28, 2014, at 19:20 , Jerry Jongerius <jerryj@duckware.com> wrote:

> Jonathan,
>=20
> Yes, WireShark shows that *only* one packet gets lost.  Regardless of =
RWIN
> size.  The RWIN size can be below the BDP (no measurable queuing =
within the
> CMTS).  Or, the RWIN size can be very large, causing significant =
queuing
> within the CMTS.  With a larger RWIN value, the single dropped packet
> typically happens sooner in the download, rather than later.  The fact =
there
> is no "burst loss" is a significant clue.
>=20
> The graph is fully explained by the Westwood+ algorithm that the =
server is
> using.  If you input the data observed into the Westwood+ bandwidth
> estimator, you end up with the rate seen in the graph after the packet =
loss
> event.  The reason the rate gets limited (no ramp up) is due to =
Westwood+
> behavior on a RTO.  And the reason there is the RTO is due the =
bufferbloat,
> and the timing of the lost packet in relation to when the bufferbloat
> starts.  When there is no RTO, I see the expected drop (to the =
Westwood+
> bandwidth estimate) and ramp back up.  On a RTO, Westwood+ sets both
> ssthresh and cwnd to its bandwidth estimate.
>=20
> The PC does SACK, the server does not, so not used.  Timestamps off.

	Okay that is interesting, Could I convince you to try to enable =
SACK on the server and test whether you still see the catastrophic =
results? And/or try another tcp variant instead of westwood+, like the =
default cubic.

Best Regards
	Sebastian

>=20
> - Jerry
>=20
>=20
> -----Original Message-----
> From: Jonathan Morton [mailto:chromatix99@gmail.com]=20
> Sent: Thursday, August 28, 2014 10:08 AM
> To: Jerry Jongerius
> Cc: 'Greg White'; 'Sebastian Moeller'; bloat@lists.bufferbloat.net
> Subject: Re: [Bloat] The Dark Problem with AQM in the Internet?
>=20
>=20
> On 28 Aug, 2014, at 4:19 pm, Jerry Jongerius wrote:
>=20
>> AQM is a great solution for bufferbloat.  End of story.  But if you =
want
> to track down which device in the network intentionally dropped a =
packet
> (when many devices in the network path will be running AQM), how are =
you
> going to do that?  Or how do you propose to do that?
>=20
> We don't plan to do that.  Not from the outside.  Frankly, we can't =
reliably
> tell which routers drop packets today, when AQM is not at all widely
> deployed, so that's no great loss.
>=20
> But if ECN finally gets deployed, AQM can set the Congestion =
Experienced
> flag instead of dropping packets, most of the time.  You still don't =
get to
> see which router did it, but the packet still gets through and the TCP
> session knows what to do about it.
>=20
>> The graph presented is caused the interaction of a single dropped =
packet,
> bufferbloat, and the Westwood+ congestion control algorithm - and not =
power
> boost.
>=20
> This surprises me somewhat - Westwood+ is supposed to be deliberately
> tolerant of single packet losses, since it was designed explicitly to =
get
> around the problem of slight random loss on wireless networks.
>=20
> I'd be surprised if, in fact, *only* one packet was lost.  The more =
usual
> case is of "burst loss", where several packets are lost in quick =
succession,
> and not necessarily consecutive packets.  This tends to happen =
repeatedly on
> dump drop-tail queues, unless the buffer is so large that it =
accommodates
> the entire receive window (which, for modern OSes, is quite impressive =
in a
> dark sort of way).  Burst loss is characteristic of congestion, =
whereas
> random loss tends to lose isolated packets, so it would be much less
> surprising for Westwood+ to react to it.
>=20
> The packets were lost in the first place because the queue became
> chock-full, probably at just about the exact moment when the =
PowerBoost
> allowance ran out and the bandwidth came down (which tends to cause =
the
> buffer to fill rapidly), so you get the worst-case scenario: the =
buffer at
> its fullest, and the bandwidth draining it at its minimum.  This =
maximises
> the time before your TCP gets to even notice the lost packet's =
nonexistence,
> during which the sender keeps the buffer full because it still thinks
> everything's fine.
>=20
> What is probably happening is that the bottleneck queue, being so =
large,
> delays the retransmission of the lost packet until the Retransmit =
Timer
> expires.  This will cause Reno-family TCPs to revert to slow-start, =
assuming
> (rightly in this case) that the characteristics of the channel have =
changed.
> You can see that it takes most of the first second for the sender to =
ramp up
> to full speed, and nearly as long to ramp back up to the reduced =
speed, both
> of which are characteristic of slow-start at WAN latencies.  NB: =
during
> slow-start, the buffer remains empty as long as the incoming data rate =
is
> less than the output capacity, so latency is at a minimum.
>=20
> Do you have TCP SACK and timestamps turned on?  Those usually allow =
minor
> losses like that to be handled more gracefully - the sending TCP gets =
a
> better idea of the RTT (allowing it to set the Retransmit Timer more
> intelligently), and would be able to see that progress is still being =
made
> with the backlog of buffered packets, even though the core TCP ACK is =
not
> advancing.  In the event of burst loss, it would also be able to =
retransmit
> the correct set of packets straight away.
>=20
> What AQM would do for you here - if your ISP implemented it properly - =
is to
> eliminate the negative effects of filling that massive buffer at your =
ISP.
> It would allow the sending TCP to detect and recover from any packet =
loss
> more quickly, and with ECN turned on you probably wouldn't even get =
any
> packet loss.
>=20
> What's also interesting is that, after recovering from the change in
> bandwidth, you get smaller bursts of about 15-40KB arriving at roughly
> half-second intervals, mixed in with the relatively steady 1-, 2- and
> 3-packet stream.  That is characteristic of low-level packet loss with =
a
> low-latency recovery.
>=20
> This either implies that your ISP has stuck you on a much shorter =
buffer for
> the lower-bandwidth (non-PowerBoost) regime, *or* that the sender is
> enforcing a smaller congestion window on you after having suffered a
> slow-start recovery.  The latter restricts your bandwidth to match the
> delay-bandwidth product, but happily the "delay" in that equation is =
at a
> minimum if it keeps your buffer empty.
>=20
> And frankly, you're still getting 45Mbps under those conditions.  Many
> people would kill for that sort of performance - although they'd =
probably
> then want to kill everyone in the Comcast call centre later on.
>=20
> - Jonathan Morton
>=20
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat