From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi0-x234.google.com (mail-oi0-x234.google.com [IPv6:2607:f8b0:4003:c06::234]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id E052B21F307 for ; Mon, 1 Sep 2014 10:40:35 -0700 (PDT) Received: by mail-oi0-f52.google.com with SMTP id e131so3717753oig.11 for ; Mon, 01 Sep 2014 10:40:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=6b8Jl7+9CyZhGR7R1ewICUqXXH2dSRj7lkA0P/BqMRo=; b=Ztcnor1k0bDJGpzYWuFLoF5H7Pnm7UtEXkOT/qfUhSUxjYcNCIL3EP/GC1oO5VUgBR PHDEpGAClaJ4hEvAwy25NWgzWfPrCPZHO3Nsi1Kk6SZ3t6eFDvy8CKfoE3qKb+O/9jsG EE2RPyg3CwUoOsXb+MZbgA0FslfblLxL0g+qAAP2U6M+f5C0dz3pckIpbMT50ZB2Ron0 RbsvbixCiqe0enKo47EzJigEnfCYCpW9zV6k/Erir5HxGz098xNeNsLql9o62KOSYoFm LQwad6XWEJKsvuqM8tr7lmoDeXjRYEn3l/PWRpI1JF+P/4Mh3dyVv1s+YzP8n1wsVibU 0rLA== MIME-Version: 1.0 X-Received: by 10.182.129.230 with SMTP id nz6mr27611559obb.16.1409593234997; Mon, 01 Sep 2014 10:40:34 -0700 (PDT) Received: by 10.202.227.76 with HTTP; Mon, 1 Sep 2014 10:40:34 -0700 (PDT) In-Reply-To: <000101cfc60a$61f19a70$25d4cf50$@duckware.com> References: <000001cfbefe$69194c70$3b4be550$@duckware.com> <000901cfc2c2$c21ae460$4650ad20$@duckware.com> <4A89264B-36C5-4D1F-9E5E-33F2B42C364E@gmail.com> <002201cfc2e4$565c1100$03143300$@duckware.com> <002a01cfc396$ba5c8510$2f158f30$@duckware.com> <569E96E0-297C-4895-B402-F2B55E1953FA@gmail.com> <20140829232853.07cef202@urahara> <000101cfc60a$61f19a70$25d4cf50$@duckware.com> Date: Mon, 1 Sep 2014 10:40:34 -0700 Message-ID: From: Dave Taht To: Jerry Jongerius Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: bloat Subject: Re: [Bloat] The Dark Problem with AQM in the Internet? X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 01 Sep 2014 17:40:36 -0000 On Mon, Sep 1, 2014 at 10:30 AM, Jerry Jongerius wrot= e: > Westwood+, as described in published researched papers, does not fully > explain the graph that was seen. However, Westwood+, as implemented in > Linux, DOES fully explain the graph that was seen. One place to review t= he > source code is here: > > http://lxr.free-electrons.com/source/net/ipv4/tcp_westwood.c?v=3D3.2 > > Some observations about this code: > > 1. The bandwidth estimate is run through a =E2=80=9C(7=C3=97prev+new)/8= =E2=80=9D filter TWICE > [see lines 93-94]. > 2. The units of time for all objects in the code (rtt, bwe, delta, etc) i= s > =E2=80=98jiffies=E2=80=99, not milliseconds, nor microseconds [see line 1= 08]. > 3. The bandwidth estimate is updated every =E2=80=9Crtt=E2=80=9D with the= test in the code > (line 139) essentially: delta>rtt. However, =E2=80=9Crtt=E2=80=9D is the= last unsmoothed > rtt seen on the link (and increasing during bufferbloat). When rtt > increases, the frequency of bandwidth updates drops. > 4. The server is Linux 3.2 with HZ=3D100 (meaning jiffies increases every > 10ms). Oy, this also means that there is no BQL on this server, and thus it's TX ring can get quite filled. So I'd like to see what a BQL enabled server does to westwood+ now. https://www.bufferbloat.net/projects/codel/wiki/Best_practices_for_benchmar= king_Codel_and_FQ_Codel I imagine that tcp offloads are enabled, also? So much work went into fixin= g things like tcp timestamps, etc, in the face of TSO, after 3.2. > > When you graph some of the raw data observed (see > http://www.duckware.com/blog/the-dark-problem-with-aqm-in-the-internet/im= age > s/chart.gif), the Westwood+ bandwidth estimate takes significant time to > ramp up. > > For the first 0.84 seconds of the download, we expect the Westwood+ code = to > update the bandwidth estimate around 14 times, or once every 60ms or so. > However, after this, we know there is a bufferbloat episode, with RTT tim= es > increasing (decreasing the frequency of bandwidth updates). The red line= in > the graph above suggests that Westwood might have only updated the bandwi= dth > estimate around 9-10 more times, before using it to set cwnd/ssthresh. > > - Jerry > > > > > -----Original Message----- > From: Jonathan Morton [mailto:chromatix99@gmail.com] > Sent: Saturday, August 30, 2014 2:46 AM > To: Stephen Hemminger > Cc: Jerry Jongerius; bloat@lists.bufferbloat.net > Subject: Re: [Bloat] The Dark Problem with AQM in the Internet? > > > On 30 Aug, 2014, at 9:28 am, Stephen Hemminger wrote: > >> On Sat, 30 Aug 2014 09:05:58 +0300 >> Jonathan Morton wrote: >> >>> >>> On 29 Aug, 2014, at 5:37 pm, Jerry Jongerius wrote: >>> >>>>> did you check to see if packets were re-sent even if they weren't >>>>> lost? on of the side effects of excessive buffering is that it's >>>>> possible for a packet to be held in the buffer long enough that the >>>>> sender thinks that it's been lost and retransmits it, so the packet >>>>> is effectivly 'lost' even if it actually arrives at it's destination. >>>> >>>> Yes. A duplicate packet for the missing packet is not seen. >>>> >>>> The receiver 'misses' a packet; starts sending out tons of dup acks >>>> (for all packets in flight and queued up due to bufferbloat), and >>>> then way later, the packet does come in (after the RTT caused by >>>> bufferbloat; indicating it is the 'resent' packet). >>> >>> I think I've cracked this one - the cause, if not the solution. >>> >>> Let's assume, for the moment, that Jerry is correct and PowerBoost play= s > no part in this. That implies that the flow is not using the full bandwi= dth > after the loss, *and* that the additive increase of cwnd isn't sufficient= to > recover to that point within the test period. >>> >>> There *is* a sequence of events that can lead to that happening: >>> >>> 1) Packet is lost, at the tail end of the bottleneck queue. >>> >>> 2) Eventually, receiver sees the loss and starts sending duplicate acks > (each triggering CA_EVENT_SLOW_ACK path in the sender). Sender (running > Westwood+) assumes that each of these represents a received, full-size > packet, for bandwidth estimation purposes. >>> >>> 3) The receiver doesn't send, or the sender doesn't receive, a duplicat= e > ack for every packet actually received. Maybe some firewall sees a large > number of identical packets arriving - without SACK or timestamps, they > *would* be identical - and filters some of them. The bandwidth estimate > therefore becomes significantly lower than the true value, and additional= ly > the RTO fires and causes the sender to reset cwnd to 1 (CA_EVENT_LOSS). >>> >>> 4) The retransmitted packet finally reaches the receiver, and the ack i= t > sends includes all the data received in the meantime (about 3.5MB). This= is > not sufficient to immediately reset the bandwidth estimate to the true > value, because the BWE is sampled at RTT intervals, and also includes > low-pass filtering. >>> >>> 5) This ends the recovery phase (CA_EVENT_CWR_COMPLETE), and the sender > resets the slow-start threshold to correspond to the estimated > delay-bandwidth product (MinRTT * BWE) at that moment. >>> >>> 6) This estimated DBP is lower than the true value, so the subsequent > slow-start phase ends with the cwnd inadequately sized. Additive increas= e > would eventually correct that - but the key word is *eventually*. >>> >>> - Jonathan Morton >> >> Bandwidth estimates by ack RTT is fraught with problems. The returning >> ACK can be delayed for any number of reasons such as other traffic or >> aggregation. This kind of delay based congestion control suffers badly > from any latency induced in the network. >> So instead of causing bloat, it gets hit by bloat. > > In this case, the TCP is actually tracking RTT surprisingly well, but the > bandwidth estimate goes wrong because the duplicate ACKs go missing. Not= e > that if the MinRTT was estimated too high (which is the only direction it > could go), this would result in the slow-start threshold being *higher* t= han > required, and the symptoms observed would not occur, since the cwnd would > grow to the required value after recovery. > > This is the opposite effect from what happens to TCP Vegas in a bloated > environment. Vegas stops increasing cwnd when the estimated RTT is > noticeably higher than MinRTT, but if the true MinRTT changes (or it has = to > compete with a non-Vegas TCP flow), it has trouble tracking that fact. > > There is another possibility: that the assumption of non-queue RTT being > constant against varying bandwidth is incorrect. If that is the case, th= en > the observed behaviour can be explained without recourse to lost duplicat= e > ACKs - so Westwood+ is correctly tracking both MinRTT and BWE - but (MinR= TT > * BWE) turns out to be a poor estimate of the true BDP. I think this sti= ll > fails to explain why the cwnd is reset (which should occur only on RTO), = but > everything else potentially fits. > > I think we can distinguish the two theories by running tests against a > server that supports SACK and timestamps, and where ideally we can captur= e > packet traces at both ends. > > - Jonathan Morton > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat --=20 Dave T=C3=A4ht NSFW: https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0296_= indecent.article