From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <chromatix99@gmail.com>
Received: from mail-lb0-x233.google.com (mail-lb0-x233.google.com
	[IPv6:2a00:1450:4010:c04::233])
	(using TLSv1 with cipher RC4-SHA (128/128 bits))
	(Client CN "smtp.gmail.com",
	Issuer "Google Internet Authority G2" (verified OK))
	by huchra.bufferbloat.net (Postfix) with ESMTPS id 0520921F549
	for <bloat@lists.bufferbloat.net>; Thu, 28 Aug 2014 07:07:44 -0700 (PDT)
Received: by mail-lb0-f179.google.com with SMTP id l4so959845lbv.10
	for <bloat@lists.bufferbloat.net>; Thu, 28 Aug 2014 07:07:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=subject:mime-version:content-type:from:in-reply-to:date:cc
	:content-transfer-encoding:message-id:references:to;
	bh=CP54bs3Iv2QUZNXKRs5mwdbq+SEDcbHJTPqs4yafRF4=;
	b=lkXkhMdsNuHtoByfksORLzOK26TNZnXmIGulKNPGWSVJhgXpTV25uVxu74VAkrr3gH
	59sfoE3qRbV5Fu3XiEylitb0ibIgmkXX7gPSYIcRT9np+C/cr8n7h4RjuFgP9RzJ68bQ
	vtAcYfXci2kAeJduIMK36+gdJ058xWwjRiPO2YVclbDU5yiENNePS53i91RaLMlIE/wT
	wolT+kE/097RcndbGKauB9TOVAR6nC8qzisuqhtwTCvmwKUkBxmUab8c4niPBkIjIqdw
	ObS7JxDyXsdtiJGqJvZ4f3ogDodCU7pSjYtOj2giX6aqSGmWplWu17FrZPbY4LCZWzhO
	LXxw==
X-Received: by 10.112.61.68 with SMTP id n4mr4259093lbr.91.1409234862083;
	Thu, 28 Aug 2014 07:07:42 -0700 (PDT)
Received: from bass.home.chromatix.fi (188-67-17-104.bb.dnainternet.fi.
	[188.67.17.104])
	by mx.google.com with ESMTPSA id r1sm2436498lae.33.2014.08.28.07.07.39
	for <multiple recipients>
	(version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
	Thu, 28 Aug 2014 07:07:40 -0700 (PDT)
Mime-Version: 1.0 (Apple Message framework v1085)
Content-Type: text/plain; charset=windows-1252
From: Jonathan Morton <chromatix99@gmail.com>
In-Reply-To: <000901cfc2c2$c21ae460$4650ad20$@duckware.com>
Date: Thu, 28 Aug 2014 17:07:37 +0300
Content-Transfer-Encoding: quoted-printable
Message-Id: <4A89264B-36C5-4D1F-9E5E-33F2B42C364E@gmail.com>
References: <000001cfbefe$69194c70$3b4be550$@duckware.com>
	<D542A271-BFFF-4494-8EE9-CBC9BFEB09EE@gmx.de>
	<D020C902.3BF0A%g.white@cablelabs.com>
	<000901cfc2c2$c21ae460$4650ad20$@duckware.com>
To: "Jerry Jongerius" <jerryj@duckware.com>
X-Mailer: Apple Mail (2.1085)
Cc: bloat@lists.bufferbloat.net
Subject: Re: [Bloat] The Dark Problem with AQM in the Internet?
X-BeenThere: bloat@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: General list for discussing Bufferbloat <bloat.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/bloat>,
	<mailto:bloat-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/bloat>
List-Post: <mailto:bloat@lists.bufferbloat.net>
List-Help: <mailto:bloat-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/bloat>,
	<mailto:bloat-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Thu, 28 Aug 2014 14:07:45 -0000


On 28 Aug, 2014, at 4:19 pm, Jerry Jongerius wrote:

> AQM is a great solution for bufferbloat.  End of story.  But if you =
want to track down which device in the network intentionally dropped a =
packet (when many devices in the network path will be running AQM), how =
are you going to do that?  Or how do you propose to do that?

We don't plan to do that.  Not from the outside.  Frankly, we can't =
reliably tell which routers drop packets today, when AQM is not at all =
widely deployed, so that's no great loss.

But if ECN finally gets deployed, AQM can set the Congestion Experienced =
flag instead of dropping packets, most of the time.  You still don't get =
to see which router did it, but the packet still gets through and the =
TCP session knows what to do about it.

> The graph presented is caused the interaction of a single dropped =
packet, bufferbloat, and the Westwood+ congestion control algorithm =96 =
and not power boost.

This surprises me somewhat - Westwood+ is supposed to be deliberately =
tolerant of single packet losses, since it was designed explicitly to =
get around the problem of slight random loss on wireless networks.

I'd be surprised if, in fact, *only* one packet was lost.  The more =
usual case is of "burst loss", where several packets are lost in quick =
succession, and not necessarily consecutive packets.  This tends to =
happen repeatedly on dump drop-tail queues, unless the buffer is so =
large that it accommodates the entire receive window (which, for modern =
OSes, is quite impressive in a dark sort of way).  Burst loss is =
characteristic of congestion, whereas random loss tends to lose isolated =
packets, so it would be much less surprising for Westwood+ to react to =
it.

The packets were lost in the first place because the queue became =
chock-full, probably at just about the exact moment when the PowerBoost =
allowance ran out and the bandwidth came down (which tends to cause the =
buffer to fill rapidly), so you get the worst-case scenario: the buffer =
at its fullest, and the bandwidth draining it at its minimum.  This =
maximises the time before your TCP gets to even notice the lost packet's =
nonexistence, during which the sender keeps the buffer full because it =
still thinks everything's fine.

What is probably happening is that the bottleneck queue, being so large, =
delays the retransmission of the lost packet until the Retransmit Timer =
expires.  This will cause Reno-family TCPs to revert to slow-start, =
assuming (rightly in this case) that the characteristics of the channel =
have changed.  You can see that it takes most of the first second for =
the sender to ramp up to full speed, and nearly as long to ramp back up =
to the reduced speed, both of which are characteristic of slow-start at =
WAN latencies.  NB: during slow-start, the buffer remains empty as long =
as the incoming data rate is less than the output capacity, so latency =
is at a minimum.

Do you have TCP SACK and timestamps turned on?  Those usually allow =
minor losses like that to be handled more gracefully - the sending TCP =
gets a better idea of the RTT (allowing it to set the Retransmit Timer =
more intelligently), and would be able to see that progress is still =
being made with the backlog of buffered packets, even though the core =
TCP ACK is not advancing.  In the event of burst loss, it would also be =
able to retransmit the correct set of packets straight away.

What AQM would do for you here - if your ISP implemented it properly - =
is to eliminate the negative effects of filling that massive buffer at =
your ISP.  It would allow the sending TCP to detect and recover from any =
packet loss more quickly, and with ECN turned on you probably wouldn't =
even get any packet loss.

What's also interesting is that, after recovering from the change in =
bandwidth, you get smaller bursts of about 15-40KB arriving at roughly =
half-second intervals, mixed in with the relatively steady 1-, 2- and =
3-packet stream.  That is characteristic of low-level packet loss with a =
low-latency recovery.

This either implies that your ISP has stuck you on a much shorter buffer =
for the lower-bandwidth (non-PowerBoost) regime, *or* that the sender is =
enforcing a smaller congestion window on you after having suffered a =
slow-start recovery.  The latter restricts your bandwidth to match the =
delay-bandwidth product, but happily the "delay" in that equation is at =
a minimum if it keeps your buffer empty.

And frankly, you're still getting 45Mbps under those conditions.  Many =
people would kill for that sort of performance - although they'd =
probably then want to kill everyone in the Comcast call centre later on.

 - Jonathan Morton