[Bloat] Router congestion, slow ping/ack times with kernel 5.4.60
Thomas Rosenstein
thomas.rosenstein at creamfinance.com
Mon Nov 9 05:09:33 EST 2020
On 9 Nov 2020, at 9:24, Jesper Dangaard Brouer wrote:
> On Sat, 07 Nov 2020 14:00:04 +0100
> Thomas Rosenstein via Bloat <bloat at lists.bufferbloat.net> wrote:
>
>> Here's an extract from the ethtool https://pastebin.com/cabpWGFz just
>> in
>> case there's something hidden.
>
> Yes, there is something hiding in the data from ethtool_stats.pl[1]:
> (10G Mellanox Connect-X cards via 10G SPF+ DAC)
>
> stat: 1 ( 1) <= outbound_pci_stalled_wr_events
> /sec
> stat: 339731557 (339,731,557) <= rx_buffer_passed_thres_phy /sec
>
> I've not seen this counter 'rx_buffer_passed_thres_phy' before,
> looking
> in the kernel driver code it is related to "rx_buffer_almost_full".
> The numbers per second is excessive (but it be related to a driver bug
> as it ends up reading "high" -> rx_buffer_almost_full_high in the
> extended counters).
>
> stat: 29583661 ( 29,583,661) <= rx_bytes /sec
> stat: 30343677 ( 30,343,677) <= rx_bytes_phy /sec
>
> You are receiving with 236 Mbit/s in 10Gbit/s link. There is a
> difference between what the OS sees (rx_bytes) and what the NIC
> hardware sees (rx_bytes_phy) (diff approx 6Mbit/s).
>
> stat: 19552 ( 19,552) <= rx_packets /sec
> stat: 19950 ( 19,950) <= rx_packets_phy /sec
Could these packets be from VLAN interfaces that are not used in the OS?
>
> Above RX packet counters also indicated HW is seeing more packets that
> OS is receiving.
>
> Next counters is likely your problem:
>
> stat: 718 ( 718) <= tx_global_pause /sec
> stat: 954035 ( 954,035) <= tx_global_pause_duration /sec
> stat: 714 ( 714) <= tx_pause_ctrl_phy /sec
As far as I can see that's only the TX, and we are only doing RX on this
interface - so maybe that's irrelevant?
>
> It looks like you have enabled Ethernet Flow-Control, and something is
> causing pause frames to be generated. It seem strange that this
> happen
> on a 10Gbit/s link with only 236 Mbit/s.
>
> The TX byte counters are also very strange:
>
> stat: 26063 ( 26,063) <= tx_bytes /sec
> stat: 71950 ( 71,950) <= tx_bytes_phy /sec
Also, it's TX, and we are only doing RX, as I said already somewhere,
it's async routing, so the TX data comes via another router back.
>
> --
> Best regards,
> Jesper Dangaard Brouer
> MSc.CS, Principal Kernel Engineer at Red Hat
> LinkedIn: http://www.linkedin.com/in/brouer
>
> [1]
> https://github.com/netoptimizer/network-testing/blob/master/bin/ethtool_stats.pl
>
> Strange size distribution:
> stat: 19922 ( 19,922) <= rx_1519_to_2047_bytes_phy /sec
> stat: 14 ( 14) <= rx_65_to_127_bytes_phy /sec
More information about the Bloat
mailing list