From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dave.taht@gmail.com>
Received: from mail-oi0-x234.google.com (mail-oi0-x234.google.com
	[IPv6:2607:f8b0:4003:c06::234])
	(using TLSv1 with cipher RC4-SHA (128/128 bits))
	(Client CN "smtp.gmail.com",
	Issuer "Google Internet Authority G2" (verified OK))
	by huchra.bufferbloat.net (Postfix) with ESMTPS id E052B21F307
	for <bloat@lists.bufferbloat.net>; Mon,  1 Sep 2014 10:40:35 -0700 (PDT)
Received: by mail-oi0-f52.google.com with SMTP id e131so3717753oig.11
	for <bloat@lists.bufferbloat.net>; Mon, 01 Sep 2014 10:40:35 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
	h=mime-version:in-reply-to:references:date:message-id:subject:from:to
	:cc:content-type:content-transfer-encoding;
	bh=6b8Jl7+9CyZhGR7R1ewICUqXXH2dSRj7lkA0P/BqMRo=;
	b=Ztcnor1k0bDJGpzYWuFLoF5H7Pnm7UtEXkOT/qfUhSUxjYcNCIL3EP/GC1oO5VUgBR
	PHDEpGAClaJ4hEvAwy25NWgzWfPrCPZHO3Nsi1Kk6SZ3t6eFDvy8CKfoE3qKb+O/9jsG
	EE2RPyg3CwUoOsXb+MZbgA0FslfblLxL0g+qAAP2U6M+f5C0dz3pckIpbMT50ZB2Ron0
	RbsvbixCiqe0enKo47EzJigEnfCYCpW9zV6k/Erir5HxGz098xNeNsLql9o62KOSYoFm
	LQwad6XWEJKsvuqM8tr7lmoDeXjRYEn3l/PWRpI1JF+P/4Mh3dyVv1s+YzP8n1wsVibU
	0rLA==
MIME-Version: 1.0
X-Received: by 10.182.129.230 with SMTP id nz6mr27611559obb.16.1409593234997; 
	Mon, 01 Sep 2014 10:40:34 -0700 (PDT)
Received: by 10.202.227.76 with HTTP; Mon, 1 Sep 2014 10:40:34 -0700 (PDT)
In-Reply-To: <000101cfc60a$61f19a70$25d4cf50$@duckware.com>
References: <000001cfbefe$69194c70$3b4be550$@duckware.com>
	<D542A271-BFFF-4494-8EE9-CBC9BFEB09EE@gmx.de>
	<D020C902.3BF0A%g.white@cablelabs.com>
	<000901cfc2c2$c21ae460$4650ad20$@duckware.com>
	<4A89264B-36C5-4D1F-9E5E-33F2B42C364E@gmail.com>
	<002201cfc2e4$565c1100$03143300$@duckware.com>
	<alpine.DEB.2.02.1408281857550.23856@nftneq.ynat.uz>
	<002a01cfc396$ba5c8510$2f158f30$@duckware.com>
	<569E96E0-297C-4895-B402-F2B55E1953FA@gmail.com>
	<20140829232853.07cef202@urahara>
	<AC40CF0D-5BC2-426E-902C-F02AC8C4D114@gmail.com>
	<000101cfc60a$61f19a70$25d4cf50$@duckware.com>
Date: Mon, 1 Sep 2014 10:40:34 -0700
Message-ID: <CAA93jw6vc2+GHrm7sRL51tmk4VXAzRBQo-iu6dn+qOwrefzbqQ@mail.gmail.com>
From: Dave Taht <dave.taht@gmail.com>
To: Jerry Jongerius <jerryj@duckware.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: bloat <bloat@lists.bufferbloat.net>
Subject: Re: [Bloat] The Dark Problem with AQM in the Internet?
X-BeenThere: bloat@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: General list for discussing Bufferbloat <bloat.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/bloat>,
	<mailto:bloat-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/bloat>
List-Post: <mailto:bloat@lists.bufferbloat.net>
List-Help: <mailto:bloat-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/bloat>,
	<mailto:bloat-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Mon, 01 Sep 2014 17:40:36 -0000

On Mon, Sep 1, 2014 at 10:30 AM, Jerry Jongerius <jerryj@duckware.com> wrot=
e:
> Westwood+, as described in published researched papers, does not fully
> explain the graph that was seen.  However, Westwood+, as implemented in
> Linux, DOES fully explain the graph that was seen.  One place to review t=
he
> source code is here:
>
> http://lxr.free-electrons.com/source/net/ipv4/tcp_westwood.c?v=3D3.2
>
> Some observations about this code:
>
> 1. The bandwidth estimate is run through a =E2=80=9C(7=C3=97prev+new)/8=
=E2=80=9D filter TWICE
> [see lines 93-94].
> 2. The units of time for all objects in the code (rtt, bwe, delta, etc) i=
s
> =E2=80=98jiffies=E2=80=99, not milliseconds, nor microseconds [see line 1=
08].
> 3. The bandwidth estimate is updated every =E2=80=9Crtt=E2=80=9D with the=
 test in the code
> (line 139) essentially: delta>rtt.  However, =E2=80=9Crtt=E2=80=9D is the=
 last unsmoothed
> rtt seen on the link (and increasing during bufferbloat).  When rtt
> increases, the frequency of bandwidth updates drops.
> 4. The server is Linux 3.2 with HZ=3D100 (meaning jiffies increases every
> 10ms).

Oy, this also means that there is no BQL on this server, and thus it's
TX ring can get quite filled. So I'd like to see what a BQL enabled server
does to westwood+ now.

https://www.bufferbloat.net/projects/codel/wiki/Best_practices_for_benchmar=
king_Codel_and_FQ_Codel

I imagine that tcp offloads are enabled, also? So much work went into fixin=
g
things like tcp timestamps, etc, in the face of TSO, after 3.2.


>
> When you graph some of the raw data observed (see
> http://www.duckware.com/blog/the-dark-problem-with-aqm-in-the-internet/im=
age
> s/chart.gif), the Westwood+ bandwidth estimate takes significant time to
> ramp up.

>
> For the first 0.84 seconds of the download, we expect the Westwood+ code =
to
> update the bandwidth estimate around 14 times, or once every 60ms or so.
> However, after this, we know there is a bufferbloat episode, with RTT tim=
es
> increasing (decreasing the frequency of bandwidth updates).  The red line=
 in
> the graph above suggests that Westwood might have only updated the bandwi=
dth
> estimate around 9-10 more times, before using it to set cwnd/ssthresh.
>
> - Jerry
>
>
>
>
> -----Original Message-----
> From: Jonathan Morton [mailto:chromatix99@gmail.com]
> Sent: Saturday, August 30, 2014 2:46 AM
> To: Stephen Hemminger
> Cc: Jerry Jongerius; bloat@lists.bufferbloat.net
> Subject: Re: [Bloat] The Dark Problem with AQM in the Internet?
>
>
> On 30 Aug, 2014, at 9:28 am, Stephen Hemminger wrote:
>
>> On Sat, 30 Aug 2014 09:05:58 +0300
>> Jonathan Morton <chromatix99@gmail.com> wrote:
>>
>>>
>>> On 29 Aug, 2014, at 5:37 pm, Jerry Jongerius wrote:
>>>
>>>>> did you check to see if packets were re-sent even if they weren't
>>>>> lost? on of the side effects of excessive buffering is that it's
>>>>> possible for a packet to be held in the buffer long enough that the
>>>>> sender thinks that it's been lost and retransmits it, so the packet
>>>>> is effectivly 'lost' even if it actually arrives at it's destination.
>>>>
>>>> Yes.  A duplicate packet for the missing packet is not seen.
>>>>
>>>> The receiver 'misses' a packet; starts sending out tons of dup acks
>>>> (for all packets in flight and queued up due to bufferbloat), and
>>>> then way later, the packet does come in (after the RTT caused by
>>>> bufferbloat; indicating it is the 'resent' packet).
>>>
>>> I think I've cracked this one - the cause, if not the solution.
>>>
>>> Let's assume, for the moment, that Jerry is correct and PowerBoost play=
s
> no part in this.  That implies that the flow is not using the full bandwi=
dth
> after the loss, *and* that the additive increase of cwnd isn't sufficient=
 to
> recover to that point within the test period.
>>>
>>> There *is* a sequence of events that can lead to that happening:
>>>
>>> 1) Packet is lost, at the tail end of the bottleneck queue.
>>>
>>> 2) Eventually, receiver sees the loss and starts sending duplicate acks
> (each triggering CA_EVENT_SLOW_ACK path in the sender).  Sender (running
> Westwood+) assumes that each of these represents a received, full-size
> packet, for bandwidth estimation purposes.
>>>
>>> 3) The receiver doesn't send, or the sender doesn't receive, a duplicat=
e
> ack for every packet actually received.  Maybe some firewall sees a large
> number of identical packets arriving - without SACK or timestamps, they
> *would* be identical - and filters some of them.  The bandwidth estimate
> therefore becomes significantly lower than the true value, and additional=
ly
> the RTO fires and causes the sender to reset cwnd to 1 (CA_EVENT_LOSS).
>>>
>>> 4) The retransmitted packet finally reaches the receiver, and the ack i=
t
> sends includes all the data received in the meantime (about 3.5MB).  This=
 is
> not sufficient to immediately reset the bandwidth estimate to the true
> value, because the BWE is sampled at RTT intervals, and also includes
> low-pass filtering.
>>>
>>> 5) This ends the recovery phase (CA_EVENT_CWR_COMPLETE), and the sender
> resets the slow-start threshold to correspond to the estimated
> delay-bandwidth product (MinRTT * BWE) at that moment.
>>>
>>> 6) This estimated DBP is lower than the true value, so the subsequent
> slow-start phase ends with the cwnd inadequately sized.  Additive increas=
e
> would eventually correct that - but the key word is *eventually*.
>>>
>>> - Jonathan Morton
>>
>> Bandwidth estimates by ack RTT is fraught with problems. The returning
>> ACK can be delayed for any number of reasons such as other traffic or
>> aggregation. This kind of delay based congestion control suffers badly
> from any latency induced in the network.
>> So instead of causing bloat, it gets hit by bloat.
>
> In this case, the TCP is actually tracking RTT surprisingly well, but the
> bandwidth estimate goes wrong because the duplicate ACKs go missing.  Not=
e
> that if the MinRTT was estimated too high (which is the only direction it
> could go), this would result in the slow-start threshold being *higher* t=
han
> required, and the symptoms observed would not occur, since the cwnd would
> grow to the required value after recovery.
>
> This is the opposite effect from what happens to TCP Vegas in a bloated
> environment.  Vegas stops increasing cwnd when the estimated RTT is
> noticeably higher than MinRTT, but if the true MinRTT changes (or it has =
to
> compete with a non-Vegas TCP flow), it has trouble tracking that fact.
>
> There is another possibility:  that the assumption of non-queue RTT being
> constant against varying bandwidth is incorrect.  If that is the case, th=
en
> the observed behaviour can be explained without recourse to lost duplicat=
e
> ACKs - so Westwood+ is correctly tracking both MinRTT and BWE - but (MinR=
TT
> * BWE) turns out to be a poor estimate of the true BDP.  I think this sti=
ll
> fails to explain why the cwnd is reset (which should occur only on RTO), =
but
> everything else potentially fits.
>
> I think we can distinguish the two theories by running tests against a
> server that supports SACK and timestamps, and where ideally we can captur=
e
> packet traces at both ends.
>
> - Jonathan Morton
>
> _______________________________________________
> Bloat mailing list
> Bloat@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat


--=20
Dave T=C3=A4ht

NSFW: https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0296_=
indecent.article