General list for discussing Bufferbloat
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@toke.dk>
To: Thomas Rosenstein <thomas.rosenstein@creamfinance.com>
Cc: bloat@lists.bufferbloat.net
Subject: Re: [Bloat] Router congestion, slow ping/ack times with kernel 5.4.60
Date: Thu, 05 Nov 2020 13:47:11 +0100	[thread overview]
Message-ID: <875z6kt1gw.fsf@toke.dk> (raw)
In-Reply-To: <C5192042-4145-48CE-9A66-B2C1CF24CDD8@creamfinance.com>

"Thomas Rosenstein" <thomas.rosenstein@creamfinance.com> writes:

> On 5 Nov 2020, at 13:38, Toke Høiland-Jørgensen wrote:
>
>> "Thomas Rosenstein" <thomas.rosenstein@creamfinance.com> writes:
>>
>>> On 5 Nov 2020, at 12:21, Toke Høiland-Jørgensen wrote:
>>>
>>>> "Thomas Rosenstein" <thomas.rosenstein@creamfinance.com> writes:
>>>>
>>>>>> If so, this sounds more like a driver issue, or maybe something to
>>>>>> do
>>>>>> with scheduling. Does it only happen with ICMP? You could try this
>>>>>> tool
>>>>>> for a userspace UDP measurement:
>>>>>
>>>>> It happens with all packets, therefore the transfer to backblaze 
>>>>> with
>>>>> 40
>>>>> threads goes down to ~8MB/s instead of >60MB/s
>>>>
>>>> Huh, right, definitely sounds like a kernel bug; or maybe the new
>>>> kernel
>>>> is getting the hardware into a state where it bugs out when there 
>>>> are
>>>> lots of flows or something.
>>>>
>>>> You could try looking at the ethtool stats (ethtool -S) while 
>>>> running
>>>> the test and see if any error counters go up. Here's a handy script 
>>>> to
>>>> monitor changes in the counters:
>>>>
>>>> https://github.com/netoptimizer/network-testing/blob/master/bin/ethtool_stats.pl
>>>>
>>>>> I'll try what that reports!
>>>>>
>>>>>> Also, what happens if you ping a host on the internet (*through* 
>>>>>> the
>>>>>> router instead of *to* it)?
>>>>>
>>>>> Same issue, but twice pronounced, as it seems all interfaces are
>>>>> affected.
>>>>> So, ping on one interface and the second has the issue.
>>>>> Also all traffic across the host has the issue, but on both sides, 
>>>>> so
>>>>> ping to the internet increased by 2x
>>>>
>>>> Right, so even an unloaded interface suffers? But this is the same
>>>> NIC,
>>>> right? So it could still be a hardware issue...
>>>>
>>>>> Yep default that CentOS ships, I just tested 4.12.5 there the issue
>>>>> also
>>>>> does not happen. So I guess I can bisect it then...(really don't 
>>>>> want
>>>>> to
>>>>> 😃)
>>>>
>>>> Well that at least narrows it down :)
>>>
>>> I just tested 5.9.4 seems to also fix it partly, I have long 
>>> stretches
>>> where it looks good, and then some increases again. (3.10 Stock has 
>>> them
>>> too, but not so high, rather 1-3 ms)
>>>
>>> for example:
>>>
>>> 64 bytes from x.x.x.x: icmp_seq=10 ttl=64 time=0.169 ms
>>> 64 bytes from x.x.x.x: icmp_seq=11 ttl=64 time=5.53 ms
>>> 64 bytes from x.x.x.x: icmp_seq=12 ttl=64 time=9.44 ms
>>> 64 bytes from x.x.x.x: icmp_seq=13 ttl=64 time=0.167 ms
>>> 64 bytes from x.x.x.x: icmp_seq=14 ttl=64 time=3.88 ms
>>>
>>> and then again:
>>>
>>> 64 bytes from x.x.x.x: icmp_seq=15 ttl=64 time=0.569 ms
>>> 64 bytes from x.x.x.x: icmp_seq=16 ttl=64 time=0.148 ms
>>> 64 bytes from x.x.x.x: icmp_seq=17 ttl=64 time=0.286 ms
>>> 64 bytes from x.x.x.x: icmp_seq=18 ttl=64 time=0.257 ms
>>> 64 bytes from x.x.x.x: icmp_seq=19 ttl=64 time=0.220 ms
>>> 64 bytes from x.x.x.x: icmp_seq=20 ttl=64 time=0.125 ms
>>> 64 bytes from x.x.x.x: icmp_seq=21 ttl=64 time=0.188 ms
>>> 64 bytes from x.x.x.x: icmp_seq=22 ttl=64 time=0.202 ms
>>> 64 bytes from x.x.x.x: icmp_seq=23 ttl=64 time=0.195 ms
>>> 64 bytes from x.x.x.x: icmp_seq=24 ttl=64 time=0.177 ms
>>> 64 bytes from x.x.x.x: icmp_seq=25 ttl=64 time=0.242 ms
>>> 64 bytes from x.x.x.x: icmp_seq=26 ttl=64 time=0.339 ms
>>> 64 bytes from x.x.x.x: icmp_seq=27 ttl=64 time=0.183 ms
>>> 64 bytes from x.x.x.x: icmp_seq=28 ttl=64 time=0.221 ms
>>> 64 bytes from x.x.x.x: icmp_seq=29 ttl=64 time=0.317 ms
>>> 64 bytes from x.x.x.x: icmp_seq=30 ttl=64 time=0.210 ms
>>> 64 bytes from x.x.x.x: icmp_seq=31 ttl=64 time=0.242 ms
>>> 64 bytes from x.x.x.x: icmp_seq=32 ttl=64 time=0.127 ms
>>> 64 bytes from x.x.x.x: icmp_seq=33 ttl=64 time=0.217 ms
>>> 64 bytes from x.x.x.x: icmp_seq=34 ttl=64 time=0.184 ms
>>>
>>>
>>> For me it looks now that there was some fix between 5.4.60 and 5.9.4 
>>> ...
>>> anyone can pinpoint it?
>>
>> $ git log --no-merges --oneline v5.4.60..v5.9.4|wc -l
>> 72932
>>
>> Only 73k commits; should be easy, right? :)
>>
>> (In other words no, I have no idea; I'd suggest either (a) asking on
>> netdev, (b) bisecting or (c) using 5.9+ and just making peace with not
>> knowing).
>
> Guess I'll go the easy route and let it be ...
>
> I'll update all routers to the 5.9.4 and see if it fixes the traffic 
> flow - will report back once more after that.

Sounds like a plan :)

>>
>>>>>> How did you configure the new kernel? Did you start from scratch, 
>>>>>> or
>>>>>> is
>>>>>> it based on the old centos config?
>>>>>
>>>>> first oldconfig and from there then added additional options for 
>>>>> IB,
>>>>> NVMe, etc (which I don't really need on the routers)
>>>>
>>>> OK, so you're probably building with roughly the same options in 
>>>> terms
>>>> of scheduling granularity etc. That's good. Did you enable spectre
>>>> mitigations etc on the new kernel? What's the output of
>>>> `tail /sys/devices/system/cpu/vulnerabilities/*` ?
>>>
>>> mitigations are off
>>
>> Right, I just figured maybe you were hitting some threshold that
>> involved a lot of indirect calls which slowed things down due to
>> mitigations. Guess not, then...
>>
>
> Thanks for the support :)

You're welcome!

-Toke

  reply	other threads:[~2020-11-05 12:47 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-04 15:23 Thomas Rosenstein
2020-11-04 16:10 ` Toke Høiland-Jørgensen
2020-11-04 16:24   ` Thomas Rosenstein
2020-11-05  0:10     ` Toke Høiland-Jørgensen
2020-11-05  8:48       ` Thomas Rosenstein
2020-11-05 11:21         ` Toke Høiland-Jørgensen
2020-11-05 12:22           ` Thomas Rosenstein
2020-11-05 12:38             ` Toke Høiland-Jørgensen
2020-11-05 12:41               ` Thomas Rosenstein
2020-11-05 12:47                 ` Toke Høiland-Jørgensen [this message]
2020-11-05 13:33             ` Jesper Dangaard Brouer
2020-11-06  8:48               ` Thomas Rosenstein
2020-11-06 10:53                 ` Jesper Dangaard Brouer
2020-11-06  9:18               ` Thomas Rosenstein
2020-11-06 11:18                 ` Jesper Dangaard Brouer
2020-11-06 11:37                   ` Thomas Rosenstein
2020-11-06 11:45                     ` Toke Høiland-Jørgensen
2020-11-06 12:01                       ` Thomas Rosenstein
2020-11-06 12:53                       ` Jesper Dangaard Brouer
2020-11-06 14:13                         ` Jesper Dangaard Brouer
2020-11-06 17:04                           ` Thomas Rosenstein
2020-11-06 20:19                             ` Jesper Dangaard Brouer
2020-11-07 12:37                               ` Thomas Rosenstein
2020-11-07 12:40                                 ` Jan Ceuleers
2020-11-07 12:43                                   ` Thomas Rosenstein
2020-11-07 13:00                                   ` Thomas Rosenstein
2020-11-09  8:24                                     ` Jesper Dangaard Brouer
2020-11-09 10:09                                       ` Thomas Rosenstein
2020-11-09 11:40                                         ` Jesper Dangaard Brouer
2020-11-09 11:51                                           ` Toke Høiland-Jørgensen
2020-11-09 12:25                                           ` Thomas Rosenstein
2020-11-09 14:33                                           ` Thomas Rosenstein
2020-11-12 10:05                                             ` Jesper Dangaard Brouer
2020-11-12 11:26                                               ` Thomas Rosenstein
2020-11-12 13:31                                                 ` Jesper Dangaard Brouer
2020-11-12 13:42                                                   ` Thomas Rosenstein
2020-11-12 15:42                                                     ` Jesper Dangaard Brouer
2020-11-13  6:31                                                       ` Thomas Rosenstein
2020-11-16 11:56                                                         ` Jesper Dangaard Brouer
2020-11-16 12:05                                                           ` Thomas Rosenstein
2020-11-09 16:39                                           ` Thomas Rosenstein
2020-11-07 13:33                                 ` Thomas Rosenstein
2020-11-07 16:46                                 ` Jesper Dangaard Brouer
2020-11-07 17:01                                   ` Thomas Rosenstein
2020-11-07 17:26                                     ` Sebastian Moeller
2020-11-16 12:34 ` Jesper Dangaard Brouer
2020-11-16 12:49   ` Thomas Rosenstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://lists.bufferbloat.net/postorius/lists/bloat.lists.bufferbloat.net/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=875z6kt1gw.fsf@toke.dk \
    --to=toke@toke.dk \
    --cc=bloat@lists.bufferbloat.net \
    --cc=thomas.rosenstein@creamfinance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox