[Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

Thu Nov 5 03:48:33 EST 2020

On 5 Nov 2020, at 1:10, Toke Høiland-Jørgensen wrote:

> "Thomas Rosenstein" <thomas.rosenstein at creamfinance.com> writes:
>
>> On 4 Nov 2020, at 17:10, Toke Høiland-Jørgensen wrote:
>>
>>> Thomas Rosenstein via Bloat <bloat at lists.bufferbloat.net> writes:
>>>
>>>> Hi all,
>>>>
>>>> I'm coming from the lartc mailing list, here's the original text:
>>>>
>>>> =====
>>>>
>>>> I have multiple routers which connect to multiple upstream 
>>>> providers,
>>>> I
>>>> have noticed a high latency shift in icmp (and generally all
>>>> connection)
>>>> if I run b2 upload-file --threads 40 (and I can reproduce this)
>>>>
>>>> What options do I have to analyze why this happens?
>>>>
>>>> General Info:
>>>>
>>>> Routers are connected between each other with 10G Mellanox 
>>>> Connect-X
>>>> cards via 10G SPF+ DAC cables via a 10G Switch from fs.com
>>>> Latency generally is around 0.18 ms between all routers (4).
>>>> Throughput is 9.4 Gbit/s with 0 retransmissions when tested with
>>>> iperf3.
>>>> 2 of the 4 routers are connected upstream with a 1G connection
>>>> (separate
>>>> port, same network card)
>>>> All routers have the full internet routing tables, i.e. 80k entries
>>>> for
>>>> IPv6 and 830k entries for IPv4
>>>> Conntrack is disabled (-j NOTRACK)
>>>> Kernel 5.4.60 (custom)
>>>> 2x Xeon X5670 @ 2.93 Ghz
>>>> 96 GB RAM
>>>> No Swap
>>>> CentOs 7
>>>>
>>>> During high latency:
>>>>
>>>> Latency on routers which have the traffic flow increases to 12 - 20
>>>> ms,
>>>> for all interfaces, moving of the stream (via bgp disable session)
>>>> moves
>>>> also the high latency
>>>> iperf3 performance plumets to 300 - 400 MBits
>>>> CPU load (user / system) are around 0.1%
>>>> Ram Usage is around 3 - 4 GB
>>>> if_packets count is stable (around 8000 pkt/s more)
>>>
>>> I'm not sure I get you topology. Packets are going from where to
>>> where,
>>> and what link is the bottleneck for the transfer you're doing? Are 
>>> you
>>> measuring the latency along the same path?
>>>
>>> Have you tried running 'mtr' to figure out which hop the latency is
>>> at?
>>
>> I tried to draw the topology, I hope this is okay and explains 
>> betters
>> what's happening:
>>
>> https://drive.google.com/file/d/15oAsxiNfsbjB9a855Q_dh6YvFZBDdY5I/view?usp=sharing
>
> Ohh, right, you're pinging between two of the routers across a 10 Gbps
> link with plenty of capacity to spare, and *that* goes up by two 
> orders
> of magnitude when you start the transfer, even though the transfer
> itself is <1Gbps? Am I understanding you correctly now?

Exactly :)

>
> If so, this sounds more like a driver issue, or maybe something to do
> with scheduling. Does it only happen with ICMP? You could try this 
> tool
> for a userspace UDP measurement:

It happens with all packets, therefore the transfer to backblaze with 40 
threads goes down to ~8MB/s instead of >60MB/s

>
> https://github.com/heistp/irtt/
>

I'll try what that reports!

> Also, what happens if you ping a host on the internet (*through* the
> router instead of *to* it)?

Same issue, but twice pronounced, as it seems all interfaces are 
affected.
So, ping on one interface and the second has the issue.
Also all traffic across the host has the issue, but on both sides, so 
ping to the internet increased by 2x

>
> And which version of the Connect-X cards are you using (or rather, 
> which
> driver? mlx4?)
>

It's Connect-X 4 Lx cards, specifcally: MCX4121A-ACAT
Driver is mlx5_core

>> So it must be something in the kernel tacking on a delay, I could try 
>> to
>> do a bisect and build like 10 kernels :)
>
> That may ultimately end up being necessary. However, when you say 
> 'stock
> kernel' you mean what CentOS ships, right? If so, that's not really a
> 3.10 kernel - the RHEL kernels (that centos is based on) are... 
> somewhat
> creative... about their versioning. So if you're switched to a vanilla
> upstream kernel you may find bisecting difficult :/

Yep default that CentOS ships, I just tested 4.12.5 there the issue also 
does not happen. So I guess I can bisect it then...(really don't want to 
😃)

>
> How did you configure the new kernel? Did you start from scratch, or 
> is
> it based on the old centos config?

first oldconfig and from there then added additional options for IB, 
NVMe, etc (which I don't really need on the routers)

>
> -Toke