[Bloat] Router congestion, slow ping/ack times with kernel 5.4.60
Jesper Dangaard Brouer
brouer at redhat.com
Mon Nov 16 07:34:38 EST 2020
On Wed, 04 Nov 2020 16:23:12 +0100
Thomas Rosenstein via Bloat <bloat at lists.bufferbloat.net> wrote:
[...]
> I have multiple routers which connect to multiple upstream providers, I
> have noticed a high latency shift in icmp (and generally all connection)
> if I run b2 upload-file --threads 40 (and I can reproduce this)
>
> What options do I have to analyze why this happens?
>
> General Info:
>
> Routers are connected between each other with 10G Mellanox Connect-X
> cards via 10G SPF+ DAC cables via a 10G Switch from fs.com
> Latency generally is around 0.18 ms between all routers (4).
> Throughput is 9.4 Gbit/s with 0 retransmissions when tested with iperf3.
> 2 of the 4 routers are connected upstream with a 1G connection (separate
> port, same network card)
> All routers have the full internet routing tables, i.e. 80k entries for
> IPv6 and 830k entries for IPv4
> Conntrack is disabled (-j NOTRACK)
> Kernel 5.4.60 (custom)
> 2x Xeon X5670 @ 2.93 Ghz
I think I have spotted your problem... This CPU[1] Xeon X5670 is more
than 10 years old! It basically corresponds to the machines I used for
my presentation at LinuxCon 2009 see slides[2]. Only with large frames
and with massive scaling across all CPUs was I able to get close to
10Gbit/s through these machines. And on top I had to buy low-latency
RAM memory-blocks to make it happen.
As you can see on my slides[2], memory bandwidth and PCIe speeds was at
the limit for making it possible on the hardware level. I had to run
DDR3 memory at 1333MHz and tune the QuickPath Interconnect (QPI) to
6.4GT/s (default 4.8GT/s).
This generation Motherboards had both PCIe gen-1 and gen-2 slots. Only
the PCIe gen-2 slots had barely enough bandwidth. Maybe you physically
placed NIC in PCIe gen-1 slot?
On top of this, you also have a NUMA system, 2x Xeon X5670, which can
result is A LOT of "funny" issue, that is really hard to troubleshoot...
[1] https://ark.intel.com/content/www/us/en/ark/products/47920/intel-xeon-processor-x5670-12m-cache-2-93-ghz-6-40-gt-s-intel-qpi.html
[2] https://people.netfilter.org/hawk/presentations/LinuxCon2009/LinuxCon2009_JesperDangaardBrouer_final.pdf
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
More information about the Bloat
mailing list