* [Bloat] Recommendations for fq_codel and tso/gso in 2017 @ 2017-01-27 7:21 Hans-Kristian Bakke 2017-01-27 7:55 ` Dave Täht 0 siblings, 1 reply; 7+ messages in thread From: Hans-Kristian Bakke @ 2017-01-27 7:21 UTC (permalink / raw) To: bloat [-- Attachment #1: Type: text/plain, Size: 452 bytes --] Hi After having had some issues with inconcistent tso/gso configuration causing performance issues for sch_fq with pacing in one of my systems, I wonder if is it still recommended to disable gso/tso for interfaces used with fq_codel qdiscs and shaping using HTB etc. If there is a trade off, at which bandwith does it generally make more sense to enable tso/gso than to have it disabled when doing HTB shaped fq_codel qdiscs? Regards, Hans-Kristian [-- Attachment #2: Type: text/html, Size: 1057 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Bloat] Recommendations for fq_codel and tso/gso in 2017 2017-01-27 7:21 [Bloat] Recommendations for fq_codel and tso/gso in 2017 Hans-Kristian Bakke @ 2017-01-27 7:55 ` Dave Täht 2017-01-27 14:40 ` Eric Dumazet [not found] ` <CAD_cGvFSmmFOAyArqCzjhSZAwDYnBqpAvCjKAyBi+PJS5Ofm3A@mail.gmail.com> 0 siblings, 2 replies; 7+ messages in thread From: Dave Täht @ 2017-01-27 7:55 UTC (permalink / raw) To: bloat On 1/26/17 11:21 PM, Hans-Kristian Bakke wrote: > Hi > > After having had some issues with inconcistent tso/gso configuration > causing performance issues for sch_fq with pacing in one of my systems, > I wonder if is it still recommended to disable gso/tso for interfaces > used with fq_codel qdiscs and shaping using HTB etc. At lower bandwidths gro can do terrible things. Say you have a 1Mbit uplink, and IW10. (At least one device (mvneta) will synthesise 64k of gro packets) a single IW10 burst from one flow injects 130ms of latency. > > If there is a trade off, at which bandwith does it generally make more > sense to enable tso/gso than to have it disabled when doing HTB shaped > fq_codel qdiscs? I stopped caring about tuning params at > 40Mbit. < 10 gbit, or rather, trying get below 200usec of jitter|latency. (Others care) And: My expectation was generally that people would ignore our recommendations on disabling offloads! Yes, we should revise the sample sqm code and recommendations for a post gigabit era to not bother with changing network offloads. Were you modifying the old debloat script? TBF & sch_Cake do peeling of gro/tso/gso back into packets, and then interleave their scheduling, so GRO is both helpful (transiting the stack faster) and harmless, at all bandwidths. HTB doesn't peel. We just ripped out hsfc for sqm-scripts (too buggy), alsp. Leaving: tbf + fq_codel, htb+fq_codel, and cake models there. ... Cake is coming along nicely. I'd love a test in your 2Gbit bonding scenario, particularly in a per host fairness test, at line or shaped rates. We recently got cake working well with nat. http://blog.cerowrt.org/flent/steam/down_working.svg (ignore the latency figure, the 6 flows were to spots all over the world) > Regards, > Hans-Kristian > > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Bloat] Recommendations for fq_codel and tso/gso in 2017 2017-01-27 7:55 ` Dave Täht @ 2017-01-27 14:40 ` Eric Dumazet 2017-01-27 14:49 ` Sebastian Moeller [not found] ` <CAD_cGvErzbNiP+5ADhboWpGj8Q-rQrqaRYvFZ4U8CjxEregZ4A@mail.gmail.com> [not found] ` <CAD_cGvFSmmFOAyArqCzjhSZAwDYnBqpAvCjKAyBi+PJS5Ofm3A@mail.gmail.com> 1 sibling, 2 replies; 7+ messages in thread From: Eric Dumazet @ 2017-01-27 14:40 UTC (permalink / raw) To: Dave Täht; +Cc: bloat On Thu, 2017-01-26 at 23:55 -0800, Dave Täht wrote: > > On 1/26/17 11:21 PM, Hans-Kristian Bakke wrote: > > Hi > > > > After having had some issues with inconcistent tso/gso configuration > > causing performance issues for sch_fq with pacing in one of my systems, > > I wonder if is it still recommended to disable gso/tso for interfaces > > used with fq_codel qdiscs and shaping using HTB etc. > > At lower bandwidths gro can do terrible things. Say you have a 1Mbit > uplink, and IW10. (At least one device (mvneta) will synthesise 64k of > gro packets) > > a single IW10 burst from one flow injects 130ms of latency. That is simply a sign of something bad happening from the source. The router will spend too much time trying to fix the TCP sender by smoothing things. Lets fix the root cause, instead of making everything slow or burn mega watts. GRO aggregates trains of packets for the same flow, in sub ms window. Why ? Because GRO can not predict the future : It can not know when next interrupt might come from the device telling : here is some additional packet(s). Maybe next packet is coming in 5 seconds. Take a look at napi_poll() 1) If device driver called napi_complete(), all packets are flushed (given) to upper stack. No packet will wait in GRO for additional segments. 2) Under flood (we exhausted the napi budget and did not call napi_complete()), we make sure no packet can sit in GRO for more than 1 ms. Only when the device is under flood and cpu can not drain fast enough RX queue, GRO can aggregate packets more aggressively, and the size of GRO packets exactly fits the CPU budget. In a nutshell, GRO is exactly the mechanism that adapts the packet sizes to available cpu power. If your cpu is really fast, then it will dequeue one packet at a time and GRO wont kick in. So the real problem here is that some device drivers implemented a poor interrupt mitigation logic, inherited from other OS that had not GRO and _had_ to implement their own crap, hurting latencies. Make sure you disable interrupt mitigation, and leave GRO enabled. e1000e is notoriously bad for interrupt mitigation. At Google, we let the NIC sends its RX interrupt ASAP. Every usec matters. So the model for us is very clear : Use GRO and TSO as much as we can, but make sure the producers (TCP senders) are smart and control their burst sizes. Think about 50Gbit and 100Gbit, and really the question of having or not TSO and GRO is simply moot. Even at 1Gbit, GRO is helping to reduce cpu cycles and thus reduce latencies. Adding a sysctl to limit GRO max size would be trivial, I already mentioned that, but nobody cared enough to send a patch. > > > > > If there is a trade off, at which bandwith does it generally make more > > sense to enable tso/gso than to have it disabled when doing HTB shaped > > fq_codel qdiscs? > > I stopped caring about tuning params at > 40Mbit. < 10 gbit, or rather, > trying get below 200usec of jitter|latency. (Others care) > > And: My expectation was generally that people would ignore our > recommendations on disabling offloads! > > Yes, we should revise the sample sqm code and recommendations for a post > gigabit era to not bother with changing network offloads. Were you > modifying the old debloat script? > > TBF & sch_Cake do peeling of gro/tso/gso back into packets, and then > interleave their scheduling, so GRO is both helpful (transiting the > stack faster) and harmless, at all bandwidths. > > HTB doesn't peel. We just ripped out hsfc for sqm-scripts (too buggy), > alsp. Leaving: tbf + fq_codel, htb+fq_codel, and cake models there. > > ... > > Cake is coming along nicely. I'd love a test in your 2Gbit bonding > scenario, particularly in a per host fairness test, at line or shaped > rates. We recently got cake working well with nat. > > http://blog.cerowrt.org/flent/steam/down_working.svg (ignore the latency > figure, the 6 flows were to spots all over the world) > > > Regards, > > Hans-Kristian > > > > > > _______________________________________________ > > Bloat mailing list > > Bloat@lists.bufferbloat.net > > https://lists.bufferbloat.net/listinfo/bloat > > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Bloat] Recommendations for fq_codel and tso/gso in 2017 2017-01-27 14:40 ` Eric Dumazet @ 2017-01-27 14:49 ` Sebastian Moeller 2017-01-27 14:59 ` Eric Dumazet [not found] ` <CAD_cGvErzbNiP+5ADhboWpGj8Q-rQrqaRYvFZ4U8CjxEregZ4A@mail.gmail.com> 1 sibling, 1 reply; 7+ messages in thread From: Sebastian Moeller @ 2017-01-27 14:49 UTC (permalink / raw) To: Eric Dumazet; +Cc: Dave Täht, bloat Hi Eric, quick question from the peanut gallery: on a typical home router with 1Gbps internal and <<100Mbps external interfaces, will giant packets will be generated by the 1Gbps interface (with acceptable latency)? I ask, as what makes sense on a 1000Mbps ingress link, might still block an 20Mbps wan egress link slightly longer than one would like (to the tune of 50ms, just based on the bandwidth ratio?). Best Regards Sebastian > On Jan 27, 2017, at 15:40, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > On Thu, 2017-01-26 at 23:55 -0800, Dave Täht wrote: >> >> On 1/26/17 11:21 PM, Hans-Kristian Bakke wrote: >>> Hi >>> >>> After having had some issues with inconcistent tso/gso configuration >>> causing performance issues for sch_fq with pacing in one of my systems, >>> I wonder if is it still recommended to disable gso/tso for interfaces >>> used with fq_codel qdiscs and shaping using HTB etc. >> >> At lower bandwidths gro can do terrible things. Say you have a 1Mbit >> uplink, and IW10. (At least one device (mvneta) will synthesise 64k of >> gro packets) >> >> a single IW10 burst from one flow injects 130ms of latency. > > That is simply a sign of something bad happening from the source. > > The router will spend too much time trying to fix the TCP sender by > smoothing things. > > Lets fix the root cause, instead of making everything slow or burn mega > watts. > > GRO aggregates trains of packets for the same flow, in sub ms window. > > Why ? Because GRO can not predict the future : It can not know when next > interrupt might come from the device telling : here is some additional > packet(s). Maybe next packet is coming in 5 seconds. > > Take a look at napi_poll() > > 1) If device driver called napi_complete(), all packets are flushed > (given) to upper stack. No packet will wait in GRO for additional > segments. > > 2) Under flood (we exhausted the napi budget and did not call > napi_complete()), we make sure no packet can sit in GRO for more than 1 > ms. > > Only when the device is under flood and cpu can not drain fast enough RX > queue, GRO can aggregate packets more aggressively, and the size of GRO > packets exactly fits the CPU budget. > > In a nutshell, GRO is exactly the mechanism that adapts the packet sizes > to available cpu power. > > If your cpu is really fast, then it will dequeue one packet at a time > and GRO wont kick in. > > So the real problem here is that some device drivers implemented a poor > interrupt mitigation logic, inherited from other OS that had not GRO and > _had_ to implement their own crap, hurting latencies. > > Make sure you disable interrupt mitigation, and leave GRO enabled. > > e1000e is notoriously bad for interrupt mitigation. > > At Google, we let the NIC sends its RX interrupt ASAP. > > Every usec matters. > > So the model for us is very clear : Use GRO and TSO as much as we can, > but make sure the producers (TCP senders) are smart and control their > burst sizes. > > Think about 50Gbit and 100Gbit, and really the question of having or not > TSO and GRO is simply moot. > > > Even at 1Gbit, GRO is helping to reduce cpu cycles and thus reduce > latencies. > > Adding a sysctl to limit GRO max size would be trivial, I already > mentioned that, but nobody cared enough to send a patch. > >> >>> >>> If there is a trade off, at which bandwith does it generally make more >>> sense to enable tso/gso than to have it disabled when doing HTB shaped >>> fq_codel qdiscs? >> >> I stopped caring about tuning params at > 40Mbit. < 10 gbit, or rather, >> trying get below 200usec of jitter|latency. (Others care) >> >> And: My expectation was generally that people would ignore our >> recommendations on disabling offloads! >> >> Yes, we should revise the sample sqm code and recommendations for a post >> gigabit era to not bother with changing network offloads. Were you >> modifying the old debloat script? >> >> TBF & sch_Cake do peeling of gro/tso/gso back into packets, and then >> interleave their scheduling, so GRO is both helpful (transiting the >> stack faster) and harmless, at all bandwidths. >> >> HTB doesn't peel. We just ripped out hsfc for sqm-scripts (too buggy), >> alsp. Leaving: tbf + fq_codel, htb+fq_codel, and cake models there. >> > > > >> ... >> >> Cake is coming along nicely. I'd love a test in your 2Gbit bonding >> scenario, particularly in a per host fairness test, at line or shaped >> rates. We recently got cake working well with nat. >> >> http://blog.cerowrt.org/flent/steam/down_working.svg (ignore the latency >> figure, the 6 flows were to spots all over the world) >> >>> Regards, >>> Hans-Kristian >>> >>> >>> _______________________________________________ >>> Bloat mailing list >>> Bloat@lists.bufferbloat.net >>> https://lists.bufferbloat.net/listinfo/bloat >>> >> _______________________________________________ >> Bloat mailing list >> Bloat@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/bloat > > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Bloat] Recommendations for fq_codel and tso/gso in 2017 2017-01-27 14:49 ` Sebastian Moeller @ 2017-01-27 14:59 ` Eric Dumazet 0 siblings, 0 replies; 7+ messages in thread From: Eric Dumazet @ 2017-01-27 14:59 UTC (permalink / raw) To: Sebastian Moeller; +Cc: Dave Täht, bloat On Fri, 2017-01-27 at 15:49 +0100, Sebastian Moeller wrote: > Hi Eric, > > quick question from the peanut gallery: on a typical home router with > 1Gbps internal and <<100Mbps external interfaces, will giant packets > will be generated by the 1Gbps interface (with acceptable latency)? I > ask, as what makes sense on a 1000Mbps ingress link, might still block > an 20Mbps wan egress link slightly longer than one would like (to the > tune of 50ms, just based on the bandwidth ratio?). It depends if the switching on the 1Gbps involves the linux cpu or not. If switching is not passing around packets through linux stack, or you have a single station using the 1Gbit interface to the home router, then probably GRO is not needed at all. Also, we have ways to control packet sizes quite easily, stacking a virtual device before the 100Mbit external interface, as shown by this thread. bond0 - external0 If you disable TSO on bond0, GRO packets will automatically be segmented (Dave calls that peeling) before reaching external0 qdisc. So using a bonding is probably the way, leaving GRO enabled in internal0 1Gbit (or 10Gbits soon ?) interface. > > Best Regards > Sebastian > > > > > On Jan 27, 2017, at 15:40, Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > > On Thu, 2017-01-26 at 23:55 -0800, Dave Täht wrote: > >> > >> On 1/26/17 11:21 PM, Hans-Kristian Bakke wrote: > >>> Hi > >>> > >>> After having had some issues with inconcistent tso/gso configuration > >>> causing performance issues for sch_fq with pacing in one of my systems, > >>> I wonder if is it still recommended to disable gso/tso for interfaces > >>> used with fq_codel qdiscs and shaping using HTB etc. > >> > >> At lower bandwidths gro can do terrible things. Say you have a 1Mbit > >> uplink, and IW10. (At least one device (mvneta) will synthesise 64k of > >> gro packets) > >> > >> a single IW10 burst from one flow injects 130ms of latency. > > > > That is simply a sign of something bad happening from the source. > > > > The router will spend too much time trying to fix the TCP sender by > > smoothing things. > > > > Lets fix the root cause, instead of making everything slow or burn mega > > watts. > > > > GRO aggregates trains of packets for the same flow, in sub ms window. > > > > Why ? Because GRO can not predict the future : It can not know when next > > interrupt might come from the device telling : here is some additional > > packet(s). Maybe next packet is coming in 5 seconds. > > > > Take a look at napi_poll() > > > > 1) If device driver called napi_complete(), all packets are flushed > > (given) to upper stack. No packet will wait in GRO for additional > > segments. > > > > 2) Under flood (we exhausted the napi budget and did not call > > napi_complete()), we make sure no packet can sit in GRO for more than 1 > > ms. > > > > Only when the device is under flood and cpu can not drain fast enough RX > > queue, GRO can aggregate packets more aggressively, and the size of GRO > > packets exactly fits the CPU budget. > > > > In a nutshell, GRO is exactly the mechanism that adapts the packet sizes > > to available cpu power. > > > > If your cpu is really fast, then it will dequeue one packet at a time > > and GRO wont kick in. > > > > So the real problem here is that some device drivers implemented a poor > > interrupt mitigation logic, inherited from other OS that had not GRO and > > _had_ to implement their own crap, hurting latencies. > > > > Make sure you disable interrupt mitigation, and leave GRO enabled. > > > > e1000e is notoriously bad for interrupt mitigation. > > > > At Google, we let the NIC sends its RX interrupt ASAP. > > > > Every usec matters. > > > > So the model for us is very clear : Use GRO and TSO as much as we can, > > but make sure the producers (TCP senders) are smart and control their > > burst sizes. > > > > Think about 50Gbit and 100Gbit, and really the question of having or not > > TSO and GRO is simply moot. > > > > > > Even at 1Gbit, GRO is helping to reduce cpu cycles and thus reduce > > latencies. > > > > Adding a sysctl to limit GRO max size would be trivial, I already > > mentioned that, but nobody cared enough to send a patch. > > > >> > >>> > >>> If there is a trade off, at which bandwith does it generally make more > >>> sense to enable tso/gso than to have it disabled when doing HTB shaped > >>> fq_codel qdiscs? > >> > >> I stopped caring about tuning params at > 40Mbit. < 10 gbit, or rather, > >> trying get below 200usec of jitter|latency. (Others care) > >> > >> And: My expectation was generally that people would ignore our > >> recommendations on disabling offloads! > >> > >> Yes, we should revise the sample sqm code and recommendations for a post > >> gigabit era to not bother with changing network offloads. Were you > >> modifying the old debloat script? > >> > >> TBF & sch_Cake do peeling of gro/tso/gso back into packets, and then > >> interleave their scheduling, so GRO is both helpful (transiting the > >> stack faster) and harmless, at all bandwidths. > >> > >> HTB doesn't peel. We just ripped out hsfc for sqm-scripts (too buggy), > >> alsp. Leaving: tbf + fq_codel, htb+fq_codel, and cake models there. > >> > > > > > > > >> ... > >> > >> Cake is coming along nicely. I'd love a test in your 2Gbit bonding > >> scenario, particularly in a per host fairness test, at line or shaped > >> rates. We recently got cake working well with nat. > >> > >> http://blog.cerowrt.org/flent/steam/down_working.svg (ignore the latency > >> figure, the 6 flows were to spots all over the world) > >> > >>> Regards, > >>> Hans-Kristian > >>> > >>> > >>> _______________________________________________ > >>> Bloat mailing list > >>> Bloat@lists.bufferbloat.net > >>> https://lists.bufferbloat.net/listinfo/bloat > >>> > >> _______________________________________________ > >> Bloat mailing list > >> Bloat@lists.bufferbloat.net > >> https://lists.bufferbloat.net/listinfo/bloat > > > > > > _______________________________________________ > > Bloat mailing list > > Bloat@lists.bufferbloat.net > > https://lists.bufferbloat.net/listinfo/bloat > ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <CAD_cGvErzbNiP+5ADhboWpGj8Q-rQrqaRYvFZ4U8CjxEregZ4A@mail.gmail.com>]
* [Bloat] Fwd: Recommendations for fq_codel and tso/gso in 2017 [not found] ` <CAD_cGvErzbNiP+5ADhboWpGj8Q-rQrqaRYvFZ4U8CjxEregZ4A@mail.gmail.com> @ 2017-01-27 19:57 ` Hans-Kristian Bakke 0 siblings, 0 replies; 7+ messages in thread From: Hans-Kristian Bakke @ 2017-01-27 19:57 UTC (permalink / raw) To: bloat [-- Attachment #1: Type: text/plain, Size: 5294 bytes --] On 27 January 2017 at 15:40, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Thu, 2017-01-26 at 23:55 -0800, Dave Täht wrote: > > > > On 1/26/17 11:21 PM, Hans-Kristian Bakke wrote: > > > Hi > > > > > > After having had some issues with inconcistent tso/gso configuration > > > causing performance issues for sch_fq with pacing in one of my systems, > > > I wonder if is it still recommended to disable gso/tso for interfaces > > > used with fq_codel qdiscs and shaping using HTB etc. > > > > At lower bandwidths gro can do terrible things. Say you have a 1Mbit > > uplink, and IW10. (At least one device (mvneta) will synthesise 64k of > > gro packets) > > > > a single IW10 burst from one flow injects 130ms of latency. > > That is simply a sign of something bad happening from the source. > > The router will spend too much time trying to fix the TCP sender by > smoothing things. > > Lets fix the root cause, instead of making everything slow or burn mega > watts. > > GRO aggregates trains of packets for the same flow, in sub ms window. > > Why ? Because GRO can not predict the future : It can not know when next > interrupt might come from the device telling : here is some additional > packet(s). Maybe next packet is coming in 5 seconds. > > Take a look at napi_poll() > > 1) If device driver called napi_complete(), all packets are flushed > (given) to upper stack. No packet will wait in GRO for additional > segments. > > 2) Under flood (we exhausted the napi budget and did not call > napi_complete()), we make sure no packet can sit in GRO for more than 1 > ms. > > Only when the device is under flood and cpu can not drain fast enough RX > queue, GRO can aggregate packets more aggressively, and the size of GRO > packets exactly fits the CPU budget. > > In a nutshell, GRO is exactly the mechanism that adapts the packet sizes > to available cpu power. > > If your cpu is really fast, then it will dequeue one packet at a time > and GRO wont kick in. > > So the real problem here is that some device drivers implemented a poor > interrupt mitigation logic, inherited from other OS that had not GRO and > _had_ to implement their own crap, hurting latencies. > > Make sure you disable interrupt mitigation, and leave GRO enabled. > > e1000e is notoriously bad for interrupt mitigation. > > At Google, we let the NIC sends its RX interrupt ASAP. > Interesting. Do I understand you correctly that you basically recommend loading the e1000e module with InterruptThrottleRate set to 0, or is interrupt mitigation something else? options e1000e InterruptThrottleRate=0(,0,0,0...) https://www.kernel.org/doc/Documentation/networking/e1000e.txt I haven't fiddled with interruptthrottlerate since before I even heard of bufferbloat. > > Every usec matters. > > So the model for us is very clear : Use GRO and TSO as much as we can, > but make sure the producers (TCP senders) are smart and control their > burst sizes. > > Think about 50Gbit and 100Gbit, and really the question of having or not > TSO and GRO is simply moot. > > > Even at 1Gbit, GRO is helping to reduce cpu cycles and thus reduce > latencies. > > Adding a sysctl to limit GRO max size would be trivial, I already > mentioned that, but nobody cared enough to send a patch. > > > > > > > > > If there is a trade off, at which bandwith does it generally make more > > > sense to enable tso/gso than to have it disabled when doing HTB shaped > > > fq_codel qdiscs? > > > > I stopped caring about tuning params at > 40Mbit. < 10 gbit, or rather, > > trying get below 200usec of jitter|latency. (Others care) > > > > And: My expectation was generally that people would ignore our > > recommendations on disabling offloads! > > > > Yes, we should revise the sample sqm code and recommendations for a post > > gigabit era to not bother with changing network offloads. Were you > > modifying the old debloat script? > > > > TBF & sch_Cake do peeling of gro/tso/gso back into packets, and then > > interleave their scheduling, so GRO is both helpful (transiting the > > stack faster) and harmless, at all bandwidths. > > > > HTB doesn't peel. We just ripped out hsfc for sqm-scripts (too buggy), > > alsp. Leaving: tbf + fq_codel, htb+fq_codel, and cake models there. > > > > > > > ... > > > > Cake is coming along nicely. I'd love a test in your 2Gbit bonding > > scenario, particularly in a per host fairness test, at line or shaped > > rates. We recently got cake working well with nat. > > > > http://blog.cerowrt.org/flent/steam/down_working.svg (ignore the latency > > figure, the 6 flows were to spots all over the world) > > > > > Regards, > > > Hans-Kristian > > > > > > > > > _______________________________________________ > > > Bloat mailing list > > > Bloat@lists.bufferbloat.net > > > https://lists.bufferbloat.net/listinfo/bloat > > > > > _______________________________________________ > > Bloat mailing list > > Bloat@lists.bufferbloat.net > > https://lists.bufferbloat.net/listinfo/bloat > > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat > [-- Attachment #2: Type: text/html, Size: 7941 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <CAD_cGvFSmmFOAyArqCzjhSZAwDYnBqpAvCjKAyBi+PJS5Ofm3A@mail.gmail.com>]
* [Bloat] Fwd: Recommendations for fq_codel and tso/gso in 2017 [not found] ` <CAD_cGvFSmmFOAyArqCzjhSZAwDYnBqpAvCjKAyBi+PJS5Ofm3A@mail.gmail.com> @ 2017-01-27 19:56 ` Hans-Kristian Bakke 0 siblings, 0 replies; 7+ messages in thread From: Hans-Kristian Bakke @ 2017-01-27 19:56 UTC (permalink / raw) To: bloat [-- Attachment #1: Type: text/plain, Size: 3579 bytes --] Thank you for answering! On 27 January 2017 at 08:55, Dave Täht <dave@taht.net> wrote: > > > On 1/26/17 11:21 PM, Hans-Kristian Bakke wrote: > > Hi > > > > After having had some issues with inconcistent tso/gso configuration > > causing performance issues for sch_fq with pacing in one of my systems, > > I wonder if is it still recommended to disable gso/tso for interfaces > > used with fq_codel qdiscs and shaping using HTB etc. > > At lower bandwidths gro can do terrible things. Say you have a 1Mbit > uplink, and IW10. (At least one device (mvneta) will synthesise 64k of > gro packets) > > a single IW10 burst from one flow injects 130ms of latency. > > > > > If there is a trade off, at which bandwith does it generally make more > > sense to enable tso/gso than to have it disabled when doing HTB shaped > > fq_codel qdiscs? > > I stopped caring about tuning params at > 40Mbit. < 10 gbit, or rather, > trying get below 200usec of jitter|latency. (Others care) > > And: My expectation was generally that people would ignore our > recommendations on disabling offloads! > > Yes, we should revise the sample sqm code and recommendations for a post > gigabit era to not bother with changing network offloads. Were you > modifying the old debloat script? > I just picked it up from just about any bufferbloat script or introduction I have seen in the last 4 years. In addition it seemed to bring the bandwith accuracy of the shaped stream a little bit closer to the bandwith I actually configured in HTB in my own testing, which, if I remember correctly, was then done on a symmetrical link that was shaped to around 25 mbit/s, so I just took it for granted. However, the fq pacing issue I had when I had a bond interface with tso and gso disabled on top of physical nics with tso and gso enabled, made me think that disabling tso and gso perhaps is not really expected behaviour for new implentations in the linux network stack. Perhaps it works nicely for my shaping needs, but also gives me other not so obvious issues in other ways. > TBF & sch_Cake do peeling of gro/tso/gso back into packets, and then > interleave their scheduling, so GRO is both helpful (transiting the > stack faster) and harmless, at all bandwidths. > > HTB doesn't peel. We just ripped out hsfc for sqm-scripts (too buggy), > alsp. Leaving: tbf + fq_codel, htb+fq_codel, and cake models there. > > ... > > Cake is coming along nicely. I'd love a test in your 2Gbit bonding > scenario, particularly in a per host fairness test, at line or shaped > rates. We recently got cake working well with nat. > > Is this something I can do for you? This is a system in production. Non-critical enough to play with some qdiscs and generate some bandwith usage, but still in production. It is not really possible for me to remove all other traffic and factors that may interfere with the results (or is a real life scenario perhaps the point?). But running a few scripts is no problem if that is what is required! > http://blog.cerowrt.org/flent/steam/down_working.svg (ignore the latency > figure, the 6 flows were to spots all over the world) > > > Regards, > > Hans-Kristian > > > > > > _______________________________________________ > > Bloat mailing list > > Bloat@lists.bufferbloat.net > > https://lists.bufferbloat.net/listinfo/bloat > > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat > [-- Attachment #2: Type: text/html, Size: 5674 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-01-27 19:57 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-01-27 7:21 [Bloat] Recommendations for fq_codel and tso/gso in 2017 Hans-Kristian Bakke 2017-01-27 7:55 ` Dave Täht 2017-01-27 14:40 ` Eric Dumazet 2017-01-27 14:49 ` Sebastian Moeller 2017-01-27 14:59 ` Eric Dumazet [not found] ` <CAD_cGvErzbNiP+5ADhboWpGj8Q-rQrqaRYvFZ4U8CjxEregZ4A@mail.gmail.com> 2017-01-27 19:57 ` [Bloat] Fwd: " Hans-Kristian Bakke [not found] ` <CAD_cGvFSmmFOAyArqCzjhSZAwDYnBqpAvCjKAyBi+PJS5Ofm3A@mail.gmail.com> 2017-01-27 19:56 ` Hans-Kristian Bakke
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox