From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <eric.dumazet@gmail.com>
Received: from mail-pg0-x243.google.com (mail-pg0-x243.google.com
 [IPv6:2607:f8b0:400e:c05::243])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id 255E63B2A4
 for <bloat@lists.bufferbloat.net>; Thu, 26 Jan 2017 15:41:24 -0500 (EST)
Received: by mail-pg0-x243.google.com with SMTP id 204so23150468pge.2
 for <bloat@lists.bufferbloat.net>; Thu, 26 Jan 2017 12:41:24 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=message-id:subject:from:to:cc:date:in-reply-to:references
 :mime-version:content-transfer-encoding;
 bh=NlFqHXoZeZpztvU3W+bvnVLrIKo46JmMEI1xEmWfnbo=;
 b=dpTfFyGqLhoMRocqZxT556fI24BNJ3IMCoU/YupvLUI3oKLOQt84vFRD66ppQJ7Yvn
 enWt5wIe/7chJ8nubMuhs5gsz3aUowQlM1TJJou9krVd5vmL0G1AbXFDkUTHN4XWez87
 4ICplpg2cS+vJzawqCUmdtZ+74tfXC06ITN591ouncEEwTV0mPp0uA1o4rwVOmwmnysk
 8QCEoleRjFSf5Sz3R5nD/oY9PdI5mipfPbCbbEUuYP3BICIg+M44R9XPFtChj5FcpDyV
 PJ7gBr7hpZZadGao31lLFY3+B2FfkCgLotBw/fCxbtOdTuauOyjXYFaGM6nvLDAsqU7Q
 +MmQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to
 :references:mime-version:content-transfer-encoding;
 bh=NlFqHXoZeZpztvU3W+bvnVLrIKo46JmMEI1xEmWfnbo=;
 b=lDdSd62EZ0JH5YUFGiMVFLMyDSisclNLJx5R/sbRoZmmlpcJeH2KSHAFQXVW2r6zIj
 RqovVtl3EEnivoIfySBoJZ5tqJtWlryG/j9LE7TcWQl4IcMKfcIk0MbVlJuSnEyIDRMO
 Z6nmic/TNZTLFSil4jsJzKkU2mUmrWR1rN1QYbJvoh+O/nySRqAGyjejM6K5z8oKP2hF
 sW9kwwhKm8JWPOx0Sds7Od5Ih9WD0mR95Tp0UWSdO/RRHkC5cTcM16hbko8L4YIQFoZv
 pfp7BAuszrksqDotP1yfFLd+xwsQbYixUYdA5GRSZHBm6eWbu0uxPIph483k+DtFmd+F
 evcg==
X-Gm-Message-State: AIkVDXKprjlnQaXM5zR+t0KO8R9mVXW7nKxjRqKtB+2jDyD1Ue+LgLhdBuC8hZnMgOEO2Q==
X-Received: by 10.99.56.94 with SMTP id h30mr5328859pgn.23.1485463282916;
 Thu, 26 Jan 2017 12:41:22 -0800 (PST)
Received: from ?IPv6:2620:0:1000:1704:84b:7df1:bc16:f35b?
 ([2620:0:1000:1704:84b:7df1:bc16:f35b])
 by smtp.googlemail.com with ESMTPSA id f3sm5438310pga.34.2017.01.26.12.41.21
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Thu, 26 Jan 2017 12:41:21 -0800 (PST)
Message-ID: <1485463281.5145.164.camel@edumazet-glaptop3.roam.corp.google.com>
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Hans-Kristian Bakke <hkbakke@gmail.com>
Cc: David Lang <david@lang.hm>, bloat <bloat@lists.bufferbloat.net>
Date: Thu, 26 Jan 2017 12:41:21 -0800
In-Reply-To: <CAD_cGvFfG-wK6VgG7+2XPXRhnt1x1obRcfs+qzShViZ5K+O1ag@mail.gmail.com>
References: <CAD_cGvG-CvWL9tZ-w8NuctLV=CvahXh0Svkpydk_GmGPcm0wHg@mail.gmail.com>
 <CAD_cGvHhmBDjt4gFYRHBELyAKEXeH4AycRFjH4bcWCC7RAbacw@mail.gmail.com>
 <CAD_cGvFmNm-EnVztd2i2Q8dxeV+Av_mPO6sLte8KqiMgQgBU=Q@mail.gmail.com>
 <1485458323.5145.151.camel@edumazet-glaptop3.roam.corp.google.com>
 <nycvar.QRO.7.75.62.1701261154110.6590@qynat-yncgbc>
 <CAD_cGvFfG-wK6VgG7+2XPXRhnt1x1obRcfs+qzShViZ5K+O1ag@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"
X-Mailer: Evolution 3.10.4-0ubuntu2 
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Subject: Re: [Bloat] Excessive throttling with fq
X-BeenThere: bloat@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: General list for discussing Bufferbloat <bloat.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/bloat>,
 <mailto:bloat-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/bloat>
List-Post: <mailto:bloat@lists.bufferbloat.net>
List-Help: <mailto:bloat-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/bloat>,
 <mailto:bloat-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Thu, 26 Jan 2017 20:41:24 -0000


Can you post :

ethtool -i eth0
ethtool -k eth0

grep HZ /boot/config.... (what is the HZ value of your kernel)

I suspect a possible problem with TSO autodefer when/if HZ < 1000

Thanks.

On Thu, 2017-01-26 at 21:19 +0100, Hans-Kristian Bakke wrote:
> There are two packet captures from fq with and without pacing here:
> 
> 
> https://owncloud.proikt.com/index.php/s/KuXIl8h8bSFH1fM
> 
> 
> 
> The server (with fq pacing/nopacing) is 10.0.5.10 and is running a
> Apache2 webserver at port tcp port 443. The tcp client is nginx
> reverse proxy at 10.0.5.13 on the same subnet which again is proxying
> the connection from the Windows 10 client. 
> - I did try to connect directly to the server with the client (via a
> linux gateway router) avoiding the nginx proxy and just using plain
> no-ssl http. That did not change anything. 
> - I also tried stopping the eth0 interface to force the traffic to the
> eth1 interface in the LACP which changed nothing.
> - I also pulled each of the cable on the switch to force the traffic
> to switch between interfaces in the LACP link between the client
> switch and the server switch.
> 
> 
> The CPU is a 5-6 year old Intel Xeon X3430 CPU @ 4x2.40GHz on a
> SuperMicro platform. It is not very loaded and the results are always
> in the same ballpark with fq pacing on. 
> 
> 
> 
> top - 21:12:38 up 12 days, 11:08,  4 users,  load average: 0.56, 0.68,
> 0.77
> Tasks: 1344 total,   1 running, 1343 sleeping,   0 stopped,   0 zombie
> %Cpu0  :  0.0 us,  1.0 sy,  0.0 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0
> si,  0.0 st
> %Cpu1  :  0.0 us,  0.3 sy,  0.0 ni, 97.4 id,  2.0 wa,  0.0 hi,  0.3
> si,  0.0 st
> %Cpu2  :  0.0 us,  2.0 sy,  0.0 ni, 96.4 id,  1.3 wa,  0.0 hi,  0.3
> si,  0.0 st
> %Cpu3  :  0.7 us,  2.3 sy,  0.0 ni, 94.1 id,  3.0 wa,  0.0 hi,  0.0
> si,  0.0 st
> KiB Mem : 16427572 total,   173712 free,  9739976 used,  6513884
> buff/cache
> KiB Swap:  6369276 total,  6126736 free,   242540 used.  6224836 avail
> Mem
> 
> 
> This seems OK to me. It does have 24 drives in 3 ZFS pools at 144TB
> raw storage in total with several SAS HBAs that is pretty much always
> poking the system in some way or the other.
> 
> 
> There are around 32K interrupts when running @23 MB/s (as seen in
> chrome downloads) with pacing on and about 25K interrupts when running
> @105 MB/s with fq nopacing. Is that normal?
> 
> 
> Hans-Kristian
> 
> 
> 
> On 26 January 2017 at 20:58, David Lang <david@lang.hm> wrote:
>         Is there any CPU bottleneck?
>         
>         pacing causing this sort of problem makes me thing that the
>         CPU either can't keep up or that something (Hz setting type of
>         thing) is delaying when the CPU can get used.
>         
>         It's not clear from the posts if the problem is with sending
>         data or receiving data.
>         
>         David Lang
>         
>         
>         On Thu, 26 Jan 2017, Eric Dumazet wrote:
>         
>                 Nothing jumps on my head.
>                 
>                 We use FQ on links varying from 1Gbit to 100Gbit, and
>                 we have no such
>                 issues.
>                 
>                 You could probably check on the server the TCP various
>                 infos given by ss
>                 command
>                 
>                 
>                 ss -temoi dst <remoteip>
>                 
>                 
>                 pacing rate is shown. You might have some issues, but
>                 it is hard to say.
>                 
>                 
>                 On Thu, 2017-01-26 at 19:55 +0100, Hans-Kristian Bakke
>                 wrote:
>                         After some more testing I see that if I
>                         disable fq pacing the
>                         performance is restored to the expected
>                         levels: # for i in eth0 eth1; do tc qdisc
>                         replace dev $i root fq nopacing;
>                         done
>                         
>                         
>                         Is this expected behaviour? There is some
>                         background traffic, but only
>                         in the sub 100 mbit/s on the switches and
>                         gateway between the server
>                         and client.
>                         
>                         
>                         The chain:
>                         Windows 10 client -> 1000 mbit/s -> switch ->
>                         2xgigabit LACP -> switch
>                         -> 4 x gigabit LACP -> gw (fq_codel on all
>                         nics) -> 4 x gigabit LACP
>                         (the same as in) -> switch -> 2 x lacp ->
>                         server (with misbehaving fq
>                         pacing)
>                         
>                         
>                         
>                         On 26 January 2017 at 19:38, Hans-Kristian
>                         Bakke <hkbakke@gmail.com>
>                         wrote:
>                                 I can add that this is without BBR,
>                         just plain old kernel 4.8
>                                 cubic.
>                         
>                                 On 26 January 2017 at 19:36,
>                         Hans-Kristian Bakke
>                                 <hkbakke@gmail.com> wrote:
>                                         Another day, another fq issue
>                         (or user error).
>                         
>                         
>                                         I try to do the seeminlig
>                         simple task of downloading a
>                                         single large file over local
>                         gigabit  LAN from a
>                                         physical server running kernel
>                         4.8 and sch_fq on intel
>                                         server NICs.
>                         
>                         
>                                         For some reason it wouldn't go
>                         past around 25 MB/s.
>                                         After having replaced SSL with
>                         no SSL, replaced apache
>                                         with nginx and verified that
>                         there is plenty of
>                                         bandwith available between my
>                         client and the server I
>                                         tried to change qdisc from fq
>                         to pfifo_fast. It
>                                         instantly shot up to around
>                         the expected 85-90 MB/s.
>                                         The same happened with
>                         fq_codel in place of fq.
>                         
>                         
>                                         I then checked the statistics
>                         for fq and the throttled
>                                         counter is increasing
>                         massively every second (eth0 and
>                                         eth1 is LACPed using Linux
>                         bonding so both is seen
>                                         here):
>                         
>                         
>                                         qdisc fq 8007: root refcnt 2
>                         limit 10000p flow_limit
>                                         100p buckets 1024 orphan_mask
>                         1023 quantum 3028
>                                         initial_quantum 15140
>                         refill_delay 40.0ms
>                                          Sent 787131797 bytes 520082
>                         pkt (dropped 15,
>                                         overlimits 0 requeues 0)
>                                          backlog 98410b 65p requeues 0
>                                           15 flows (14 inactive, 1
>                         throttled)
>                                           0 gc, 2 highprio, 259920
>                         throttled, 15 flows_plimit
>                                         qdisc fq 8008: root refcnt 2
>                         limit 10000p flow_limit
>                                         100p buckets 1024 orphan_mask
>                         1023 quantum 3028
>                                         initial_quantum 15140
>                         refill_delay 40.0ms
>                                          Sent 2533167 bytes 6731 pkt
>                         (dropped 0, overlimits 0
>                                         requeues 0)
>                                          backlog 0b 0p requeues 0
>                                           24 flows (24 inactive, 0
>                         throttled)
>                                           0 gc, 2 highprio, 397
>                         throttled
>                         
>                         
>                                         Do you have any suggestions?
>                         
>                         
>                                         Regards,
>                                         Hans-Kristian
>                         
>                         
>                         
>                         
>                         _______________________________________________
>                         Bloat mailing list
>                         Bloat@lists.bufferbloat.net
>                         https://lists.bufferbloat.net/listinfo/bloat
>                 
>                 
>                 _______________________________________________
>                 Bloat mailing list
>                 Bloat@lists.bufferbloat.net
>                 https://lists.bufferbloat.net/listinfo/bloat
> 
>