On 27 January 2017 at 15:40, Eric Dumazet wrote: > On Thu, 2017-01-26 at 23:55 -0800, Dave Täht wrote: > > > > On 1/26/17 11:21 PM, Hans-Kristian Bakke wrote: > > > Hi > > > > > > After having had some issues with inconcistent tso/gso configuration > > > causing performance issues for sch_fq with pacing in one of my systems, > > > I wonder if is it still recommended to disable gso/tso for interfaces > > > used with fq_codel qdiscs and shaping using HTB etc. > > > > At lower bandwidths gro can do terrible things. Say you have a 1Mbit > > uplink, and IW10. (At least one device (mvneta) will synthesise 64k of > > gro packets) > > > > a single IW10 burst from one flow injects 130ms of latency. > > That is simply a sign of something bad happening from the source. > > The router will spend too much time trying to fix the TCP sender by > smoothing things. > > Lets fix the root cause, instead of making everything slow or burn mega > watts. > > GRO aggregates trains of packets for the same flow, in sub ms window. > > Why ? Because GRO can not predict the future : It can not know when next > interrupt might come from the device telling : here is some additional > packet(s). Maybe next packet is coming in 5 seconds. > > Take a look at napi_poll() > > 1) If device driver called napi_complete(), all packets are flushed > (given) to upper stack. No packet will wait in GRO for additional > segments. > > 2) Under flood (we exhausted the napi budget and did not call > napi_complete()), we make sure no packet can sit in GRO for more than 1 > ms. > > Only when the device is under flood and cpu can not drain fast enough RX > queue, GRO can aggregate packets more aggressively, and the size of GRO > packets exactly fits the CPU budget. > > In a nutshell, GRO is exactly the mechanism that adapts the packet sizes > to available cpu power. > > If your cpu is really fast, then it will dequeue one packet at a time > and GRO wont kick in. > > So the real problem here is that some device drivers implemented a poor > interrupt mitigation logic, inherited from other OS that had not GRO and > _had_ to implement their own crap, hurting latencies. > > Make sure you disable interrupt mitigation, and leave GRO enabled. > > e1000e is notoriously bad for interrupt mitigation. > > At Google, we let the NIC sends its RX interrupt ASAP. > ​Interesting. Do I understand you correctly that you basically recommend ​loading the e1000e module with InterruptThrottleRate set to 0, or is interrupt mitigation something else? options e1000e InterruptThrottleRate=0(,0,0,0...) https://www.kernel.org/doc/Documentation/networking/e1000e.txt I haven't fiddled with interruptthrottlerate since before I even heard of bufferbloat. > > Every usec matters. > > So the model for us is very clear : Use GRO and TSO as much as we can, > but make sure the producers (TCP senders) are smart and control their > burst sizes. > > Think about 50Gbit and 100Gbit, and really the question of having or not > TSO and GRO is simply moot. > > > Even at 1Gbit, GRO is helping to reduce cpu cycles and thus reduce > latencies. > > Adding a sysctl to limit GRO max size would be trivial, I already > mentioned that, but nobody cared enough to send a patch. > > > > > > > > > If there is a trade off, at which bandwith does it generally make more > > > sense to enable tso/gso than to have it disabled when doing HTB shaped > > > fq_codel qdiscs? > > > > I stopped caring about tuning params at > 40Mbit. < 10 gbit, or rather, > > trying get below 200usec of jitter|latency. (Others care) > > > > And: My expectation was generally that people would ignore our > > recommendations on disabling offloads! > > > > Yes, we should revise the sample sqm code and recommendations for a post > > gigabit era to not bother with changing network offloads. Were you > > modifying the old debloat script? > > > > TBF & sch_Cake do peeling of gro/tso/gso back into packets, and then > > interleave their scheduling, so GRO is both helpful (transiting the > > stack faster) and harmless, at all bandwidths. > > > > HTB doesn't peel. We just ripped out hsfc for sqm-scripts (too buggy), > > alsp. Leaving: tbf + fq_codel, htb+fq_codel, and cake models there. > > > > > > > ... > > > > Cake is coming along nicely. I'd love a test in your 2Gbit bonding > > scenario, particularly in a per host fairness test, at line or shaped > > rates. We recently got cake working well with nat. > > > > http://blog.cerowrt.org/flent/steam/down_working.svg (ignore the latency > > figure, the 6 flows were to spots all over the world) > > > > > Regards, > > > Hans-Kristian > > > > > > > > > _______________________________________________ > > > Bloat mailing list > > > Bloat@lists.bufferbloat.net > > > https://lists.bufferbloat.net/listinfo/bloat > > > > > _______________________________________________ > > Bloat mailing list > > Bloat@lists.bufferbloat.net > > https://lists.bufferbloat.net/listinfo/bloat > > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat >