From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mout.gmx.net (mout.gmx.net [212.227.17.20]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id B81343B2A3 for ; Fri, 27 Jan 2017 09:49:25 -0500 (EST) Received: from [172.17.3.29] ([134.76.241.253]) by mail.gmx.com (mrgmx102 [212.227.17.168]) with ESMTPSA (Nemesis) id 0Lm6IP-1bxmxG0vAn-00Zgws; Fri, 27 Jan 2017 15:49:23 +0100 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 10.2 \(3259\)) From: Sebastian Moeller In-Reply-To: <1485528030.6360.35.camel@edumazet-glaptop3.roam.corp.google.com> Date: Fri, 27 Jan 2017 15:49:20 +0100 Cc: =?utf-8?Q?Dave_T=C3=A4ht?= , bloat@lists.bufferbloat.net Content-Transfer-Encoding: quoted-printable Message-Id: References: <0496946b-827a-8527-643d-0b186f52e192@taht.net> <1485528030.6360.35.camel@edumazet-glaptop3.roam.corp.google.com> To: Eric Dumazet X-Mailer: Apple Mail (2.3259) X-Provags-ID: V03:K0:uTAh+GoZGxmVR2KSbRLTf8DoWpecHeNVVP2W21cC+sGOh8Eon0e Qv0RcagHibXevvy/cKenrobXGhcLlxrPg7E/6GwKwYHyFbUSQDCc/lY4Jb/iWj6tsA6OyeA y3TMm5+3Gw3VsUT2BihF7uLIOczlam9jb1UXewh5dDBcUffMa/tExQVOSgqRKG+4OSI8XTn yYSqAm3XaE2XEDUYLhMug== X-UI-Out-Filterresults: notjunk:1;V01:K0:MakUCDOwzUY=:aRr4OVJHiwZtsgb0+4SMHt 6Ls8Jd+c5zCo8WzNMAh/1azxq6DTxE5GlpvO+iRJ3vcRV80n9mDPD8Iz3F9j/+VqGy/vEy/eE dhwv7IUj9nZNambgDes7FMU44to6JtwswS9fmIoFUFBlbJwkCpJpBVVBJVFoBFW3tkR1rx780 WJ683D5B41q2bloIHcxdToF2YhHxDamDKnIpFK+zBKaGOzkgZStEGII3CW8pzLv/I7tKeorc8 G34GsFOnjWtiI23xuT4YiF26/f9D5rNFUr/U9y1s8yXnQr8bTqE0qD9NLwTdebvTC80OZqs5m K75K+USAIpHhFf4BbB8K2F5jIGf2Jtdb7LphGIFXMQdG7qhhSuYXvkYAlRfaesbmHhK32qyrb 7oLMaQ4i6wBOSiINDcumHBgLfu+HrFCe2oZP/xrrCpGoeYKGbpPVtokYVTwMMYe1BITu1SIw6 kK05VplMRgiAeLipIWKh1L3HueBBao6iMcOjVRk8s6I+jZgVQyoMDpv1wssI3Cy2PYp/2qv/O 6+4P4JZ/hK4DFuMqT6MYJGZmINmY63sl/HvQFowxTb2BqiJNj2sZHE8J7GbGGkN3lgyte7p0R U5PDriLcuTVapxd/zHE7e4xwhSBLcLosuAIYKopgM6PAUdXc2BH1FfN0IZhRb7b8h67esA667 k3z9WhIp1na4IhpQKaKgjhe9CiV0A3EF8r6GA7Vft8/We49VWo122KaW0/xI5HtXYXSAMq0IB WjVHItoKOmZlNHHjHq2DSdGsqDqthHJBgCr43tFVWCycHbdP39TOKfRPT/oLVpCt9AhKUwFw5 bNlYZny Subject: Re: [Bloat] Recommendations for fq_codel and tso/gso in 2017 X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 27 Jan 2017 14:49:26 -0000 Hi Eric, quick question from the peanut gallery: on a typical home router with = 1Gbps internal and <<100Mbps external interfaces, will giant packets = will be generated by the 1Gbps interface (with acceptable latency)? I = ask, as what makes sense on a 1000Mbps ingress link, might still block = an 20Mbps wan egress link slightly longer than one would like (to the = tune of 50ms, just based on the bandwidth ratio?). Best Regards Sebastian > On Jan 27, 2017, at 15:40, Eric Dumazet = wrote: >=20 > On Thu, 2017-01-26 at 23:55 -0800, Dave T=C3=A4ht wrote: >>=20 >> On 1/26/17 11:21 PM, Hans-Kristian Bakke wrote: >>> Hi >>>=20 >>> After having had some issues with inconcistent tso/gso configuration >>> causing performance issues for sch_fq with pacing in one of my = systems, >>> I wonder if is it still recommended to disable gso/tso for = interfaces >>> used with fq_codel qdiscs and shaping using HTB etc.=20 >>=20 >> At lower bandwidths gro can do terrible things. Say you have a 1Mbit >> uplink, and IW10. (At least one device (mvneta) will synthesise 64k = of >> gro packets) >>=20 >> a single IW10 burst from one flow injects 130ms of latency. >=20 > That is simply a sign of something bad happening from the source. >=20 > The router will spend too much time trying to fix the TCP sender by > smoothing things. >=20 > Lets fix the root cause, instead of making everything slow or burn = mega > watts. >=20 > GRO aggregates trains of packets for the same flow, in sub ms window. >=20 > Why ? Because GRO can not predict the future : It can not know when = next > interrupt might come from the device telling : here is some additional > packet(s). Maybe next packet is coming in 5 seconds. >=20 > Take a look at napi_poll() >=20 > 1) If device driver called napi_complete(), all packets are flushed > (given) to upper stack. No packet will wait in GRO for additional > segments. >=20 > 2) Under flood (we exhausted the napi budget and did not call > napi_complete()), we make sure no packet can sit in GRO for more than = 1 > ms. >=20 > Only when the device is under flood and cpu can not drain fast enough = RX > queue, GRO can aggregate packets more aggressively, and the size of = GRO > packets exactly fits the CPU budget. >=20 > In a nutshell, GRO is exactly the mechanism that adapts the packet = sizes > to available cpu power. >=20 > If your cpu is really fast, then it will dequeue one packet at a time > and GRO wont kick in. >=20 > So the real problem here is that some device drivers implemented a = poor > interrupt mitigation logic, inherited from other OS that had not GRO = and > _had_ to implement their own crap, hurting latencies. >=20 > Make sure you disable interrupt mitigation, and leave GRO enabled. >=20 > e1000e is notoriously bad for interrupt mitigation. >=20 > At Google, we let the NIC sends its RX interrupt ASAP. >=20 > Every usec matters. >=20 > So the model for us is very clear : Use GRO and TSO as much as we can, > but make sure the producers (TCP senders) are smart and control their > burst sizes. >=20 > Think about 50Gbit and 100Gbit, and really the question of having or = not > TSO and GRO is simply moot. >=20 >=20 > Even at 1Gbit, GRO is helping to reduce cpu cycles and thus reduce > latencies. >=20 > Adding a sysctl to limit GRO max size would be trivial, I already > mentioned that, but nobody cared enough to send a patch. >=20 >>=20 >>>=20 >>> If there is a trade off, at which bandwith does it generally make = more >>> sense to enable tso/gso than to have it disabled when doing HTB = shaped >>> fq_codel qdiscs? >>=20 >> I stopped caring about tuning params at > 40Mbit. < 10 gbit, or = rather, >> trying get below 200usec of jitter|latency. (Others care) >>=20 >> And: My expectation was generally that people would ignore our >> recommendations on disabling offloads! >>=20 >> Yes, we should revise the sample sqm code and recommendations for a = post >> gigabit era to not bother with changing network offloads. Were you >> modifying the old debloat script? >>=20 >> TBF & sch_Cake do peeling of gro/tso/gso back into packets, and then >> interleave their scheduling, so GRO is both helpful (transiting the >> stack faster) and harmless, at all bandwidths. >>=20 >> HTB doesn't peel. We just ripped out hsfc for sqm-scripts (too = buggy), >> alsp. Leaving: tbf + fq_codel, htb+fq_codel, and cake models there. >>=20 >=20 >=20 >=20 >> ... >>=20 >> Cake is coming along nicely. I'd love a test in your 2Gbit bonding >> scenario, particularly in a per host fairness test, at line or shaped >> rates. We recently got cake working well with nat. >>=20 >> http://blog.cerowrt.org/flent/steam/down_working.svg (ignore the = latency >> figure, the 6 flows were to spots all over the world) >>=20 >>> Regards, >>> Hans-Kristian >>>=20 >>>=20 >>> _______________________________________________ >>> Bloat mailing list >>> Bloat@lists.bufferbloat.net >>> https://lists.bufferbloat.net/listinfo/bloat >>>=20 >> _______________________________________________ >> Bloat mailing list >> Bloat@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/bloat >=20 >=20 > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat