General list for discussing Bufferbloat
 help / color / mirror / Atom feed
From: Jesper Dangaard Brouer <brouer@redhat.com>
To: "Thomas Rosenstein" <thomas.rosenstein@creamfinance.com>
Cc: "Bufferbloat" <bloat@lists.bufferbloat.net>,
	brouer@redhat.com, Saeed Mahameed <saeed@kernel.org>,
	Tariq Toukan <ttoukan.linux@gmail.com>
Subject: Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60
Date: Thu, 12 Nov 2020 11:05:08 +0100	[thread overview]
Message-ID: <20201112110508.59f402e7@carbon> (raw)
In-Reply-To: <27110D8E-77DF-4D10-A5EA-6430DBD55BC7@creamfinance.com>

On Mon, 09 Nov 2020 15:33:46 +0100
"Thomas Rosenstein" <thomas.rosenstein@creamfinance.com> wrote:

> On 9 Nov 2020, at 12:40, Jesper Dangaard Brouer wrote:
> 
> > On Mon, 09 Nov 2020 11:09:33 +0100
> > "Thomas Rosenstein" <thomas.rosenstein@creamfinance.com> wrote:
> >  
> >> On 9 Nov 2020, at 9:24, Jesper Dangaard Brouer wrote:
> >>  
> >>> On Sat, 07 Nov 2020 14:00:04 +0100
> >>> Thomas Rosenstein via Bloat <bloat@lists.bufferbloat.net> wrote:
> >>>  
> >>>> Here's an extract from the ethtool https://pastebin.com/cabpWGFz 
> >>>> just
> >>>> in
> >>>> case there's something hidden.  
> >>>
> >>> Yes, there is something hiding in the data from ethtool_stats.pl[1]:
> >>> (10G Mellanox Connect-X cards via 10G SPF+ DAC)
> >>>
> >>>  stat:            1 (          1) <= outbound_pci_stalled_wr_events /sec
> >>>  stat:    339731557 (339,731,557) <= rx_buffer_passed_thres_phy /sec
> >>>
> >>> I've not seen this counter 'rx_buffer_passed_thres_phy' before, 
> >>> looking
> >>> in the kernel driver code it is related to "rx_buffer_almost_full".
> >>> The numbers per second is excessive (but it be related to a driver 
> >>> bug
> >>> as it ends up reading "high" -> rx_buffer_almost_full_high in the
> >>> extended counters).  
> 
> I have now tested with a new kernel 5.9.4 build made from 3.10 with make 
> oldconfig and I noticed an interesting effect.
> 
> The first ca. 2 minutes the router behaves completely normal as with 
> 3.10, after that the ping times go crazy.
> 
> I have recorded this with ethtool, and also the ping times.
> 
> Ethtool: (13 MB)
> https://drive.google.com/file/d/1Ojp64UUw0zKwrgF_CisZb3BCdidAJYZo/view?usp=sharing
> 
> The transfer first was doing around 50 - 70 MB/s then once the ping 
> times go worse it dropped to ~12 MB/s.
> ca. Line 74324 the transfer speed drops to 12 MB/s

Lets add them here so we can see and talk about them:

 74324 Ethtool(eth4 ) stat:    144584002 (    144,584,002)  <= rx_buffer_passed_thres_phy /sec 
 74325 Ethtool(eth4 ) stat:     17243186 (     17,243,186) <= rx_bytes /sec 
 74326 Ethtool(eth4 ) stat:     19032079 (     19,032,079) <= rx_bytes_phy /sec 

Significant difference between RX-bytes Phy-level and OS-level.

17243186*8/10^6 = 138 Mbit/s

 74327 Ethtool(eth4 ) stat:          321 (            321) <= rx_cache_busy /sec 
 74328 Ethtool(eth4 ) stat:          321 (            321) <= rx_cache_full /sec 
 74329 Ethtool(eth4 ) stat:         7323 (          7,323) <= rx_cache_reuse /sec 
 74330 Ethtool(eth4 ) stat:        15188 (         15,188) <= rx_csum_complete /sec 
 74331 Ethtool(eth4 ) stat:            1 (              1) <= rx_csum_none /sec 
 74332 Ethtool(eth4 ) stat:           96 (             96) <= rx_csum_unnecessary /sec 
 74333 Ethtool(eth4 ) stat:         1242 (          1,242) <= rx_discards_phy /sec 

Packets are getting rx_discards

 74334 Ethtool(eth4 ) stat:        15285 (         15,285) <= rx_packets /sec 
 74335 Ethtool(eth4 ) stat:        16655 (         16,655) <= rx_packets_phy /sec 

(16655-15285 = 1370)
Again difference between RX-packets Phy-level and OS-level

 74336 Ethtool(eth4 ) stat:     19132497 (     19,132,497) <= rx_prio0_bytes /sec 
 74337 Ethtool(eth4 ) stat:         1264 (          1,264) <= rx_prio0_discards /sec 
 74338 Ethtool(eth4 ) stat:        15474 (         15,474) <= rx_prio0_packets /sec 



> Seems you are right about the rx_buffer_passed_thres_phy if you check 
> just those lines they appear more often once the speed dropped.
> Not sure if that's the cause or an effect of the underlying problem!

Thanks for the link to Mellanox doc:
 https://community.mellanox.com/s/article/understanding-mlx5-ethtool-counters

The counter: 'rx_buffer_passed_thres_phy'
 Description: 
  'The number of events where the port receive buffer was over 85% full.
   Supported from kernel 4.14'

The counter: 'tx_pause_ctrl_phy'
 Description:
 'The number of link layer pause packets transmitted on a physical
  port. If this counter is increasing, it implies that the NIC is
  congested and cannot absorb the traffic coming from the network.'


So, the NIC is clearly telling us that something is wrong. You are not
near link speed (12MB/s => 96Mbit/s).   And still port receive buffer
is over 85% full most of the time.  There are indications that, the NIC
is congested and cannot absorb the traffic coming from the network.

Long-shot theory(1):  Could this be caused by CPUs going into too deep
sleep states? And NIC not sending wake-up signal early enough, e.g.
because it holds on to packets for generating a bulk at IRQ raise.

Long-shot theory(2): The NIC driver IRQ interrupt coalesce
adaptive-IRQ (adj via ethtool -C) is wrong. The reason it is wrong is
because of TSO frames, due packet-to-byte ratio isn't taken into
account.

You can test theory(2) in two ways, via disabling TSO and via adjusting
ethtool --show-coalesce|--coalesce (-c|-C).  The --coalesce change can
change part of theory(1), the NIC wake-up/IRQ interval.  For CPU sleep
modes see software package 'tuned'.


> Pings:
> https://drive.google.com/file/d/16phOxM5IFU6RAl4Ua4pRqMNuLYBc4RK7/view?usp=sharing
> 
> Pause frames were activated again after the restart.
> 
> (Here a link for rerefence for the ethtool variables: 
> https://community.mellanox.com/s/article/understanding-mlx5-ethtool-counters)

To easier digest, here are some of the interesting counters.

  74171 Ethtool(eth4    ) stat:           24 (             24) <= outbound_pci_stalled_wr /sec 
  74172 Ethtool(eth4    ) stat:            1 (              1) <= outbound_pci_stalled_wr_events /sec 
  [...]
  74628 Ethtool(eth5    ) stat:           40 (             40) <= outbound_pci_stalled_wr /sec 
  74629 Ethtool(eth5    ) stat:            1 (              1) <= outbound_pci_stalled_wr_events /sec 
 

  74539 Ethtool(eth4    ) stat:          432 (            432) <= tx_global_pause /sec 
  74540 Ethtool(eth4    ) stat:       551264 (        551,264) <= tx_global_pause_duration /sec 
  74541 Ethtool(eth4    ) stat:          434 (            434) <= tx_mac_control_phy /sec 
  74542 Ethtool(eth4    ) stat:           31 (             31) <= tx_nop /sec 
  74543 Ethtool(eth4    ) stat:        24888 (         24,888) <= tx_packets /sec 
  74544 Ethtool(eth4    ) stat:        25187 (         25,187) <= tx_packets_phy /sec 
  74545 Ethtool(eth4    ) stat:          434 (            434) <= tx_pause_ctrl_phy /sec 
  74546 Ethtool(eth4    ) stat:     15190865 (     15,190,865) <= tx_prio0_bytes /sec 
  74547 Ethtool(eth4    ) stat:        24907 (         24,907) <= tx_prio0_packets /sec 
  74548 Ethtool(eth4    ) stat:       647795 (        647,795) <= tx_tso_bytes /sec 
  74549 Ethtool(eth4    ) stat:          196 (            196) <= tx_tso_packets /sec 
  74550 Ethtool(eth4    ) stat:     15005949 (     15,005,949) <= tx_vport_unicast_bytes /sec 
  74551 Ethtool(eth4    ) stat:        24780 (         24,780) <= tx_vport_unicast_packets /sec 
  74552 Ethtool(eth4    ) stat:           57 (             57) <= tx_xmit_more /sec 
  [...]
  75030 Ethtool(eth5    ) stat:           39 (             39) <= tx_global_pause /sec 
  75031 Ethtool(eth5    ) stat:       242374 (        242,374) <= tx_global_pause_duration /sec 
  75032 Ethtool(eth5    ) stat:           39 (             39) <= tx_mac_control_phy /sec 
  75033 Ethtool(eth5    ) stat:          166 (            166) <= tx_nop /sec 
  75034 Ethtool(eth5    ) stat:        14837 (         14,837) <= tx_packets /sec 
  75035 Ethtool(eth5    ) stat:        14816 (         14,816) <= tx_packets_phy /sec 
  75036 Ethtool(eth5    ) stat:           39 (             39) <= tx_pause_ctrl_phy /sec 
  75037 Ethtool(eth5    ) stat:     16748584 (     16,748,584) <= tx_prio0_bytes /sec 
  75038 Ethtool(eth5    ) stat:        14708 (         14,708) <= tx_prio0_packets /sec 
  75039 Ethtool(eth5    ) stat:      2477442 (      2,477,442) <= tx_tso_bytes /sec 
  75040 Ethtool(eth5    ) stat:          733 (            733) <= tx_tso_packets /sec 
  75041 Ethtool(eth5    ) stat:     16765330 (     16,765,330) <= tx_vport_unicast_bytes /sec 
  75042 Ethtool(eth5    ) stat:        14780 (         14,780) <= tx_vport_unicast_packets /sec 
  75043 Ethtool(eth5    ) stat:            2 (              2) <= tx_xmit_more /sec 


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


  reply	other threads:[~2020-11-12 10:05 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-04 15:23 Thomas Rosenstein
2020-11-04 16:10 ` Toke Høiland-Jørgensen
2020-11-04 16:24   ` Thomas Rosenstein
2020-11-05  0:10     ` Toke Høiland-Jørgensen
2020-11-05  8:48       ` Thomas Rosenstein
2020-11-05 11:21         ` Toke Høiland-Jørgensen
2020-11-05 12:22           ` Thomas Rosenstein
2020-11-05 12:38             ` Toke Høiland-Jørgensen
2020-11-05 12:41               ` Thomas Rosenstein
2020-11-05 12:47                 ` Toke Høiland-Jørgensen
2020-11-05 13:33             ` Jesper Dangaard Brouer
2020-11-06  8:48               ` Thomas Rosenstein
2020-11-06 10:53                 ` Jesper Dangaard Brouer
2020-11-06  9:18               ` Thomas Rosenstein
2020-11-06 11:18                 ` Jesper Dangaard Brouer
2020-11-06 11:37                   ` Thomas Rosenstein
2020-11-06 11:45                     ` Toke Høiland-Jørgensen
2020-11-06 12:01                       ` Thomas Rosenstein
2020-11-06 12:53                       ` Jesper Dangaard Brouer
2020-11-06 14:13                         ` Jesper Dangaard Brouer
2020-11-06 17:04                           ` Thomas Rosenstein
2020-11-06 20:19                             ` Jesper Dangaard Brouer
2020-11-07 12:37                               ` Thomas Rosenstein
2020-11-07 12:40                                 ` Jan Ceuleers
2020-11-07 12:43                                   ` Thomas Rosenstein
2020-11-07 13:00                                   ` Thomas Rosenstein
2020-11-09  8:24                                     ` Jesper Dangaard Brouer
2020-11-09 10:09                                       ` Thomas Rosenstein
2020-11-09 11:40                                         ` Jesper Dangaard Brouer
2020-11-09 11:51                                           ` Toke Høiland-Jørgensen
2020-11-09 12:25                                           ` Thomas Rosenstein
2020-11-09 14:33                                           ` Thomas Rosenstein
2020-11-12 10:05                                             ` Jesper Dangaard Brouer [this message]
2020-11-12 11:26                                               ` Thomas Rosenstein
2020-11-12 13:31                                                 ` Jesper Dangaard Brouer
2020-11-12 13:42                                                   ` Thomas Rosenstein
2020-11-12 15:42                                                     ` Jesper Dangaard Brouer
2020-11-13  6:31                                                       ` Thomas Rosenstein
2020-11-16 11:56                                                         ` Jesper Dangaard Brouer
2020-11-16 12:05                                                           ` Thomas Rosenstein
2020-11-09 16:39                                           ` Thomas Rosenstein
2020-11-07 13:33                                 ` Thomas Rosenstein
2020-11-07 16:46                                 ` Jesper Dangaard Brouer
2020-11-07 17:01                                   ` Thomas Rosenstein
2020-11-07 17:26                                     ` Sebastian Moeller
2020-11-16 12:34 ` Jesper Dangaard Brouer
2020-11-16 12:49   ` Thomas Rosenstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.bufferbloat.net/postorius/lists/bloat.lists.bufferbloat.net/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201112110508.59f402e7@carbon \
    --to=brouer@redhat.com \
    --cc=bloat@lists.bufferbloat.net \
    --cc=saeed@kernel.org \
    --cc=thomas.rosenstein@creamfinance.com \
    --cc=ttoukan.linux@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox