[Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

Thomas Rosenstein thomas.rosenstein at creamfinance.com
Wed Nov 4 11:24:55 EST 2020



On 4 Nov 2020, at 17:10, Toke Høiland-Jørgensen wrote:

> Thomas Rosenstein via Bloat <bloat at lists.bufferbloat.net> writes:
>
>> Hi all,
>>
>> I'm coming from the lartc mailing list, here's the original text:
>>
>> =====
>>
>> I have multiple routers which connect to multiple upstream providers, 
>> I
>> have noticed a high latency shift in icmp (and generally all 
>> connection)
>> if I run b2 upload-file --threads 40 (and I can reproduce this)
>>
>> What options do I have to analyze why this happens?
>>
>> General Info:
>>
>> Routers are connected between each other with 10G Mellanox Connect-X
>> cards via 10G SPF+ DAC cables via a 10G Switch from fs.com
>> Latency generally is around 0.18 ms between all routers (4).
>> Throughput is 9.4 Gbit/s with 0 retransmissions when tested with 
>> iperf3.
>> 2 of the 4 routers are connected upstream with a 1G connection 
>> (separate
>> port, same network card)
>> All routers have the full internet routing tables, i.e. 80k entries 
>> for
>> IPv6 and 830k entries for IPv4
>> Conntrack is disabled (-j NOTRACK)
>> Kernel 5.4.60 (custom)
>> 2x Xeon X5670 @ 2.93 Ghz
>> 96 GB RAM
>> No Swap
>> CentOs 7
>>
>> During high latency:
>>
>> Latency on routers which have the traffic flow increases to 12 - 20 
>> ms,
>> for all interfaces, moving of the stream (via bgp disable session) 
>> moves
>> also the high latency
>> iperf3 performance plumets to 300 - 400 MBits
>> CPU load (user / system) are around 0.1%
>> Ram Usage is around 3 - 4 GB
>> if_packets count is stable (around 8000 pkt/s more)
>
> I'm not sure I get you topology. Packets are going from where to 
> where,
> and what link is the bottleneck for the transfer you're doing? Are you
> measuring the latency along the same path?
>
> Have you tried running 'mtr' to figure out which hop the latency is 
> at?

I tried to draw the topology, I hope this is okay and explains betters 
what's happening:

https://drive.google.com/file/d/15oAsxiNfsbjB9a855Q_dh6YvFZBDdY5I/view?usp=sharing

There is definitly no bottleneck in any of the links, the maximum on any 
link is 16k packets/sec and around 300 Mbit/s.
In the iperf3 tests I can easily get up to 9.4 Gbit/s

So it must be something in the kernel tacking on a delay, I could try to 
do a bisect and build like 10 kernels :)

>
>> Here is the tc -s qdisc output:
>
> This indicates ("dropped 0" and "ecn_mark 0") that there's no
> backpressure on the qdisc, so something else is going on.
>
> Also, you said the issue goes away if you downgrade the kernel? That
> does sound odd...

Yes, indeed. I have only recently upgraded the kernel to the 5.4.60 and 
haven't had the issue before.

>
> -Toke


More information about the Bloat mailing list