If you have instructions for setting up a test, I could try it.
Ok, thanks for that, code and scripts are attached, see README.txt.
I now use plain netns (no lxc containers), which is easier to set up and has lower RTTs and higher throughputs, probably due to no bridge device.
Per Eric’s tip, the nfq no-op code is run with chrt -rr 99, which reduces RTTs somewhat and increases throughputs ~2-3x. No busy polling yet.
I also tried it on VMWare with a 2011 MBP, which looks radically better than the APU2 (for nfq, RTTs ~16% of APU2 and throughput 4x higher in the non-GSO case, 11x with GSO). Results attached, and to summarize:
ping mean (min-max) RTTs:
APU2, no nfq: 35 us (23-291)
APU2, nfq without GSO: 80 us (53-288)
APU2, nfq with GSO: 85 us (56-270)
2011 MBP, no nfq: 4 us (4-529) [11% of APU2]
2011 MBP, nfq without GSO: 13 us (11-197) [16% of APU2]
2011 MBP, nfq with GSO: 14 us (11-1568) [16% of APU2]
iperf3 throughputs (Gbps):
APU2, no nfq: 5.01 Gbps
APU2, nfq without GSO: 391 Mbps
APU2, nfq with GSO: 3.35 Gbps
2011 MBP, no nfq: 39.8 Gbps [7.9x APU2]
2011 MBP, nfq without GSO: 1.48 Gbps [3.8x APU2]
2011 MBP, nfq with GSO: 38.0 Gbps [11.3x APU2]
Results from a decent physical box instead of a VM may be interesting to see.