From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jerryj@duckware.com>
Received: from mout.perfora.net (mout.perfora.net [74.208.4.194])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client CN "mout.perfora.net", Issuer "Thawte SSL CA" (verified OK))
	by huchra.bufferbloat.net (Postfix) with ESMTPS id B541621F306
	for <bloat@lists.bufferbloat.net>; Mon,  1 Sep 2014 10:30:11 -0700 (PDT)
Received: from J4 (c-68-50-226-187.hsd1.md.comcast.net [68.50.226.187])
	by mrelay.perfora.net (node=mreueus001) with ESMTP (Nemesis)
	id 0M912Z-1XYdFI46kH-00CPHv; Mon, 01 Sep 2014 19:30:09 +0200
From: "Jerry Jongerius" <jerryj@duckware.com>
To: "'Jonathan Morton'" <chromatix99@gmail.com>,
	"'Stephen Hemminger'" <stephen@networkplumber.org>
References: <000001cfbefe$69194c70$3b4be550$@duckware.com>
	<D542A271-BFFF-4494-8EE9-CBC9BFEB09EE@gmx.de>
	<D020C902.3BF0A%g.white@cablelabs.com>
	<000901cfc2c2$c21ae460$4650ad20$@duckware.com>
	<4A89264B-36C5-4D1F-9E5E-33F2B42C364E@gmail.com>
	<002201cfc2e4$565c1100$03143300$@duckware.com>
	<alpine.DEB.2.02.1408281857550.23856@nftneq.ynat.uz>
	<002a01cfc396$ba5c8510$2f158f30$@duckware.com>
	<569E96E0-297C-4895-B402-F2B55E1953FA@gmail.com>
	<20140829232853.07cef202@urahara>
	<AC40CF0D-5BC2-426E-902C-F02AC8C4D114@gmail.com> 
In-Reply-To: 
Date: Mon, 1 Sep 2014 13:30:06 -0400
Message-ID: <000101cfc60a$61f19a70$25d4cf50$@duckware.com>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
X-Mailer: Microsoft Outlook 14.0
Thread-Index: AQFFVy+FAO2HJAXNGbNNrLY/R/b/6gGg5QYjATtzUJsCM/8plgJDZN5dAwhNtMwCHMBZ5gH+gvxdAoeiMfsBqFaVlAKwhzghnFZNDvCAAFoaIA==
Content-Language: en-us
X-Provags-ID: V02:K0:fs3R4RVpJ9IjcQf+tAILzGBaSy0vYiGcyKW2O+CdcjS
	cd6sPI4xtjMd6iDXfCHg+KccLF0AOhIKYbG00OAxXi7l5ZZXjB
	BZ8dr6fztnpKOEwGvrwzWGdoFlXU5xy6Xk0ilQmO8MtqxjwCVt
	uPequqqmPBxel0KOwTe5miMAyu/XKSxK2Y4xTPVpOHDR/LQeYp
	nWh5dSnmAtytO6SJo//oDsfzRWIimLhkiM47TFKnO4m/rg8okb
	OQJve4oqRQ2fLQMiTACtm/AqvafYC2m5nYddy03Kw5aMFaPyOY
	nhOhAmWgiezcSs2F3S7bTnUrT0MvRkRvO8RAeVqpEEhRX2FUSD
	6pP71mMQFdORI7a1wqXx6Qh1vD1SrKHXSP6LKEVeH
X-UI-Out-Filterresults: notjunk:1;
Cc: bloat@lists.bufferbloat.net
Subject: Re: [Bloat] The Dark Problem with AQM in the Internet?
X-BeenThere: bloat@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: General list for discussing Bufferbloat <bloat.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/bloat>,
	<mailto:bloat-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/bloat>
List-Post: <mailto:bloat@lists.bufferbloat.net>
List-Help: <mailto:bloat-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/bloat>,
	<mailto:bloat-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Mon, 01 Sep 2014 17:30:12 -0000

Westwood+, as described in published researched papers, does not fully
explain the graph that was seen.=A0 However, Westwood+, as implemented =
in
Linux, DOES fully explain the graph that was seen.=A0 One place to =
review the
source code is here:

http://lxr.free-electrons.com/source/net/ipv4/tcp_westwood.c?v=3D3.2

Some observations about this code:

1. The bandwidth estimate is run through a =93(7=D7prev+new)/8=94 filter =
TWICE
[see lines 93-94].
2. The units of time for all objects in the code (rtt, bwe, delta, etc) =
is
=91jiffies=92, not milliseconds, nor microseconds [see line 108].
3. The bandwidth estimate is updated every =93rtt=94 with the test in =
the code
(line 139) essentially: delta>rtt.=A0 However, =93rtt=94 is the last =
unsmoothed
rtt seen on the link (and increasing during bufferbloat).=A0 When rtt
increases, the frequency of bandwidth updates drops.
4. The server is Linux 3.2 with HZ=3D100 (meaning jiffies increases =
every
10ms).

When you graph some of the raw data observed (see
http://www.duckware.com/blog/the-dark-problem-with-aqm-in-the-internet/im=
age
s/chart.gif), the Westwood+ bandwidth estimate takes significant time to
ramp up.

For the first 0.84 seconds of the download, we expect the Westwood+ code =
to
update the bandwidth estimate around 14 times, or once every 60ms or =
so.=A0
However, after this, we know there is a bufferbloat episode, with RTT =
times
increasing (decreasing the frequency of bandwidth updates).=A0 The red =
line in
the graph above suggests that Westwood might have only updated the =
bandwidth
estimate around 9-10 more times, before using it to set cwnd/ssthresh.

- Jerry


-----Original Message-----
From: Jonathan Morton [mailto:chromatix99@gmail.com]=20
Sent: Saturday, August 30, 2014 2:46 AM
To: Stephen Hemminger
Cc: Jerry Jongerius; bloat@lists.bufferbloat.net
Subject: Re: [Bloat] The Dark Problem with AQM in the Internet?


On 30 Aug, 2014, at 9:28 am, Stephen Hemminger wrote:

> On Sat, 30 Aug 2014 09:05:58 +0300
> Jonathan Morton <chromatix99@gmail.com> wrote:
>=20
>>=20
>> On 29 Aug, 2014, at 5:37 pm, Jerry Jongerius wrote:
>>=20
>>>> did you check to see if packets were re-sent even if they weren't=20
>>>> lost? on of the side effects of excessive buffering is that it's=20
>>>> possible for a packet to be held in the buffer long enough that the =

>>>> sender thinks that it's been lost and retransmits it, so the packet =

>>>> is effectivly 'lost' even if it actually arrives at it's =
destination.
>>>=20
>>> Yes.=A0 A duplicate packet for the missing packet is not seen.
>>>=20
>>> The receiver 'misses' a packet; starts sending out tons of dup acks=20
>>> (for all packets in flight and queued up due to bufferbloat), and=20
>>> then way later, the packet does come in (after the RTT caused by=20
>>> bufferbloat; indicating it is the 'resent' packet).
>>=20
>> I think I've cracked this one - the cause, if not the solution.
>>=20
>> Let's assume, for the moment, that Jerry is correct and PowerBoost =
plays
no part in this.=A0 That implies that the flow is not using the full =
bandwidth
after the loss, *and* that the additive increase of cwnd isn't =
sufficient to
recover to that point within the test period.
>>=20
>> There *is* a sequence of events that can lead to that happening:
>>=20
>> 1) Packet is lost, at the tail end of the bottleneck queue.
>>=20
>> 2) Eventually, receiver sees the loss and starts sending duplicate =
acks
(each triggering CA_EVENT_SLOW_ACK path in the sender).=A0 Sender =
(running
Westwood+) assumes that each of these represents a received, full-size
packet, for bandwidth estimation purposes.
>>=20
>> 3) The receiver doesn't send, or the sender doesn't receive, a =
duplicate
ack for every packet actually received.=A0 Maybe some firewall sees a =
large
number of identical packets arriving - without SACK or timestamps, they
*would* be identical - and filters some of them.=A0 The bandwidth =
estimate
therefore becomes significantly lower than the true value, and =
additionally
the RTO fires and causes the sender to reset cwnd to 1 (CA_EVENT_LOSS).
>>=20
>> 4) The retransmitted packet finally reaches the receiver, and the ack =
it
sends includes all the data received in the meantime (about 3.5MB).=A0 =
This is
not sufficient to immediately reset the bandwidth estimate to the true
value, because the BWE is sampled at RTT intervals, and also includes
low-pass filtering.
>>=20
>> 5) This ends the recovery phase (CA_EVENT_CWR_COMPLETE), and the =
sender
resets the slow-start threshold to correspond to the estimated
delay-bandwidth product (MinRTT * BWE) at that moment.
>>=20
>> 6) This estimated DBP is lower than the true value, so the subsequent
slow-start phase ends with the cwnd inadequately sized.=A0 Additive =
increase
would eventually correct that - but the key word is *eventually*.
>>=20
>> - Jonathan Morton
>=20
> Bandwidth estimates by ack RTT is fraught with problems. The returning =

> ACK can be delayed for any number of reasons such as other traffic or=20
> aggregation. This kind of delay based congestion control suffers badly
from any latency induced in the network.
> So instead of causing bloat, it gets hit by bloat.

In this case, the TCP is actually tracking RTT surprisingly well, but =
the
bandwidth estimate goes wrong because the duplicate ACKs go missing.=A0 =
Note
that if the MinRTT was estimated too high (which is the only direction =
it
could go), this would result in the slow-start threshold being *higher* =
than
required, and the symptoms observed would not occur, since the cwnd =
would
grow to the required value after recovery.

This is the opposite effect from what happens to TCP Vegas in a bloated
environment.=A0 Vegas stops increasing cwnd when the estimated RTT is
noticeably higher than MinRTT, but if the true MinRTT changes (or it has =
to
compete with a non-Vegas TCP flow), it has trouble tracking that fact.

There is another possibility:=A0 that the assumption of non-queue RTT =
being
constant against varying bandwidth is incorrect.=A0 If that is the case, =
then
the observed behaviour can be explained without recourse to lost =
duplicate
ACKs - so Westwood+ is correctly tracking both MinRTT and BWE - but =
(MinRTT
* BWE) turns out to be a poor estimate of the true BDP.=A0 I think this =
still
fails to explain why the cwnd is reset (which should occur only on RTO), =
but
everything else potentially fits.

I think we can distinguish the two theories by running tests against a
server that supports SACK and timestamps, and where ideally we can =
capture
packet traces at both ends.

- Jonathan Morton