so, patches attached. They work. A whole bunch of flent data also attached. A pretty pic showing how much that over-bql'd burst affected tcp convergence is also attached.... The BQL reductions at line rate were typically a factor of three which translates out to the numbers observed here (where a great deal of the remaining buffering is in the switch and the stack's context switch itself). (for the record the lame-arse quad core atom on one side of my testbed struggles with a local rrul_be test - it ends up with 300000 bytes in bql with gso enabled) Toke, would you sign off on this? I'm willing to make the attempt to upstream it and take the flack from the 50gbit folk, but a sign-off would be nice (as well as a tactful commit message)) This was as tactful as I could get: cake: Make gso-splitting configurable This patch restores cake's behavior at line rate to always split gso, and makes gso splitting configurable from userspace. running cake at 1gigE, local traffic: bql limit: 131966 - no-split-gso bql limit: ~42392-45420 - split-gso On a 4 stream test splitting gso apart results in halving the observed interpacket latency at no effect in throughput. Summary of tcp_nup test run 'gso-split' (at 2018-07-26 16:03:51.824728): Ping (ms) ICMP : 0.83 0.81 ms 341 TCP upload avg : 235.43 235.39 Mbits/s 301 TCP upload sum : 941.71 941.56 Mbits/s 301 TCP upload::1 : 235.45 235.43 Mbits/s 271 TCP upload::2 : 235.45 235.41 Mbits/s 289 TCP upload::3 : 235.40 235.40 Mbits/s 288 TCP upload::4 : 235.41 235.40 Mbits/s 291 vs Summary of tcp_nup test run 'no-split-gso' (at 2018-07-26 16:37:23.563960): avg median # data pts Ping (ms) ICMP : 1.67 1.73 ms 348 TCP upload avg : 234.56 235.37 Mbits/s 301 TCP upload sum : 938.24 941.49 Mbits/s 301 TCP upload::1 : 234.55 235.38 Mbits/s 285 TCP upload::2 : 234.57 235.37 Mbits/s 286 TCP upload::3 : 234.58 235.37 Mbits/s 274 TCP upload::4 : 234.54 235.42 Mbits/s 288 -- Dave Täht CEO, TekLibre, LLC http://www.teklibre.com Tel: 1-669-226-2619