[Bloat] Router congestion, slow ping/ack times with kernel 5.4.60
Thomas Rosenstein
thomas.rosenstein at creamfinance.com
Mon Nov 9 07:25:22 EST 2020
On 9 Nov 2020, at 12:40, Jesper Dangaard Brouer wrote:
> On Mon, 09 Nov 2020 11:09:33 +0100
> "Thomas Rosenstein" <thomas.rosenstein at creamfinance.com> wrote:
>
>> On 9 Nov 2020, at 9:24, Jesper Dangaard Brouer wrote:
>>
>>> On Sat, 07 Nov 2020 14:00:04 +0100
>>> Thomas Rosenstein via Bloat <bloat at lists.bufferbloat.net> wrote:
>>>
>>>> Here's an extract from the ethtool https://pastebin.com/cabpWGFz
>>>> just
>>>> in
>>>> case there's something hidden.
>>>
>>> Yes, there is something hiding in the data from ethtool_stats.pl[1]:
>>> (10G Mellanox Connect-X cards via 10G SPF+ DAC)
>>>
>>> stat: 1 ( 1) <= outbound_pci_stalled_wr_events
>>> /sec
>>> stat: 339731557 (339,731,557) <= rx_buffer_passed_thres_phy /sec
>>>
>>> I've not seen this counter 'rx_buffer_passed_thres_phy' before,
>>> looking
>>> in the kernel driver code it is related to "rx_buffer_almost_full".
>>> The numbers per second is excessive (but it be related to a driver
>>> bug
>>> as it ends up reading "high" -> rx_buffer_almost_full_high in the
>>> extended counters).
>
> Notice this indication is a strong red-flag that something is wrong.
>
>
> Okay, but as this is a router you also need to transmit this
> (asymmetric) traffic out another interface right.
The asymmetric traffic comes back on another router, this is router-02,
traffic from internet comes back on router-01,
I also added the interfaces names.
See the updated diagram:
https://drive.google.com/file/d/15oAsxiNfsbjB9a855Q_dh6YvFZBDdY5I/view?usp=sharing
>
> Could you also provide ethtool_stats for the TX interface?
>
> Notice that the tool[1] ethtool_stats.pl support monitoring several
> interfaces at the same time, e.g. run:
>
> ethtool_stats.pl --sec 3 --dev eth4 --dev ethTX
>
> And provide output as pastebin.
I have disabled pause control, like Toke said via:
ethtool -A eth4 autoneg off rx off tx off
ethtool -A eth5 autoneg off rx off tx off
Afterwards an ethtool output, first "without" traffic for a few seconds,
then with the problematic flow.
Since the output is > 512KB I had to upload it on gdrive:
https://drive.google.com/file/d/1EVKt1LseaBuD40QE-SqFvqYSeWUEcGA_/view?usp=sharing
>
>
>>> [1]
>>> https://github.com/netoptimizer/network-testing/blob/master/bin/ethtool_stats.pl
>>>
>>> Strange size distribution:
>>> stat: 19922 ( 19,922) <= rx_1519_to_2047_bytes_phy /sec
>>> stat: 14 ( 14) <= rx_65_to_127_bytes_phy /sec
I assume it's because of the VLAN Tagging, and therefore 1522 bytes per
packet with mtu of 1500?
>>
>
> --
> Best regards,
> Jesper Dangaard Brouer
> MSc.CS, Principal Kernel Engineer at Red Hat
> LinkedIn: http://www.linkedin.com/in/brouer
More information about the Bloat
mailing list