[Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

Jesper Dangaard Brouer brouer at redhat.com
Mon Nov 16 07:34:38 EST 2020


On Wed, 04 Nov 2020 16:23:12 +0100
Thomas Rosenstein via Bloat <bloat at lists.bufferbloat.net> wrote:

[...] 
> I have multiple routers which connect to multiple upstream providers, I 
> have noticed a high latency shift in icmp (and generally all connection) 
> if I run b2 upload-file --threads 40 (and I can reproduce this)
> 
> What options do I have to analyze why this happens?
> 
> General Info:
> 
> Routers are connected between each other with 10G Mellanox Connect-X 
> cards via 10G SPF+ DAC cables via a 10G Switch from fs.com
> Latency generally is around 0.18 ms between all routers (4).
> Throughput is 9.4 Gbit/s with 0 retransmissions when tested with iperf3.
> 2 of the 4 routers are connected upstream with a 1G connection (separate 
> port, same network card)
> All routers have the full internet routing tables, i.e. 80k entries for 
> IPv6 and 830k entries for IPv4
> Conntrack is disabled (-j NOTRACK)
> Kernel 5.4.60 (custom)
> 2x Xeon X5670 @ 2.93 Ghz

I think I have spotted your problem... This CPU[1] Xeon X5670 is more
than 10 years old!  It basically corresponds to the machines I used for
my presentation at LinuxCon 2009 see slides[2].  Only with large frames
and with massive scaling across all CPUs was I able to get close to
10Gbit/s through these machines.  And on top I had to buy low-latency
RAM memory-blocks to make it happen.

As you can see on my slides[2], memory bandwidth and PCIe speeds was at
the limit for making it possible on the hardware level.  I had to run
DDR3 memory at 1333MHz and tune the QuickPath Interconnect (QPI) to
6.4GT/s (default 4.8GT/s).

This generation Motherboards had both PCIe gen-1 and gen-2 slots.  Only
the PCIe gen-2 slots had barely enough bandwidth.  Maybe you physically
placed NIC in PCIe gen-1 slot?

On top of this, you also have a NUMA system, 2x Xeon X5670, which can
result is A LOT of "funny" issue, that is really hard to troubleshoot...


[1] https://ark.intel.com/content/www/us/en/ark/products/47920/intel-xeon-processor-x5670-12m-cache-2-93-ghz-6-40-gt-s-intel-qpi.html

[2] https://people.netfilter.org/hawk/presentations/LinuxCon2009/LinuxCon2009_JesperDangaardBrouer_final.pdf

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer



More information about the Bloat mailing list