[Bloat] Router congestion, slow ping/ack times with kernel 5.4.60

Thomas Rosenstein thomas.rosenstein at creamfinance.com
Mon Nov 9 11:39:48 EST 2020



On 9 Nov 2020, at 12:40, Jesper Dangaard Brouer wrote:

> On Mon, 09 Nov 2020 11:09:33 +0100
> "Thomas Rosenstein" <thomas.rosenstein at creamfinance.com> wrote:
>
> Could you also provide ethtool_stats for the TX interface?
>
> Notice that the tool[1] ethtool_stats.pl support monitoring several
> interfaces at the same time, e.g. run:
>
>  ethtool_stats.pl --sec 3 --dev eth4 --dev ethTX
>
> And provide output as pastebin.

I have now also repeated the same test with 3.10, here are the ethtool 
outputs:

https://drive.google.com/file/d/1c98MVV0JYl6Su6xZTpqwS7m-6OlbmAFp/view?usp=sharing

and the ping times:

https://drive.google.com/file/d/1xhbGJHb5jUbPsee4frbx-c-uqh-7orXY/view?usp=sharing

Sadly the parameters we were looking at are not supported below 4.14.

but I immediatly saw 1 thing very different:

ethtool --statistics eth4 | grep discards
      rx_discards_phy: 0
      tx_discards_phy: 0

if we check the ethtool output from 5.9.4 were have:

      rx_discards_phy: 151793

And also the outbound_pci_stalled_wr_events get more frequent the lower 
the total bandwidth / the higher the ping is.
Logically there must be something blocking the the buffers, either they 
are not getting freed, or not rotated correctly, or processing is too 
slow.
I would exclude the processing, simply based on 0% CPU load, and also 
that it doesn't happen in 3.10.
Suspicious is also, that the issue only appears after a certain time of 
activity (maybe total traffic?!)


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/bloat/attachments/20201109/ad994e65/attachment.html>


More information about the Bloat mailing list