From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg0-x243.google.com (mail-pg0-x243.google.com [IPv6:2607:f8b0:400e:c05::243]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 255E63B2A4 for ; Thu, 26 Jan 2017 15:41:24 -0500 (EST) Received: by mail-pg0-x243.google.com with SMTP id 204so23150468pge.2 for ; Thu, 26 Jan 2017 12:41:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:subject:from:to:cc:date:in-reply-to:references :mime-version:content-transfer-encoding; bh=NlFqHXoZeZpztvU3W+bvnVLrIKo46JmMEI1xEmWfnbo=; b=dpTfFyGqLhoMRocqZxT556fI24BNJ3IMCoU/YupvLUI3oKLOQt84vFRD66ppQJ7Yvn enWt5wIe/7chJ8nubMuhs5gsz3aUowQlM1TJJou9krVd5vmL0G1AbXFDkUTHN4XWez87 4ICplpg2cS+vJzawqCUmdtZ+74tfXC06ITN591ouncEEwTV0mPp0uA1o4rwVOmwmnysk 8QCEoleRjFSf5Sz3R5nD/oY9PdI5mipfPbCbbEUuYP3BICIg+M44R9XPFtChj5FcpDyV PJ7gBr7hpZZadGao31lLFY3+B2FfkCgLotBw/fCxbtOdTuauOyjXYFaGM6nvLDAsqU7Q +MmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:mime-version:content-transfer-encoding; bh=NlFqHXoZeZpztvU3W+bvnVLrIKo46JmMEI1xEmWfnbo=; b=lDdSd62EZ0JH5YUFGiMVFLMyDSisclNLJx5R/sbRoZmmlpcJeH2KSHAFQXVW2r6zIj RqovVtl3EEnivoIfySBoJZ5tqJtWlryG/j9LE7TcWQl4IcMKfcIk0MbVlJuSnEyIDRMO Z6nmic/TNZTLFSil4jsJzKkU2mUmrWR1rN1QYbJvoh+O/nySRqAGyjejM6K5z8oKP2hF sW9kwwhKm8JWPOx0Sds7Od5Ih9WD0mR95Tp0UWSdO/RRHkC5cTcM16hbko8L4YIQFoZv pfp7BAuszrksqDotP1yfFLd+xwsQbYixUYdA5GRSZHBm6eWbu0uxPIph483k+DtFmd+F evcg== X-Gm-Message-State: AIkVDXKprjlnQaXM5zR+t0KO8R9mVXW7nKxjRqKtB+2jDyD1Ue+LgLhdBuC8hZnMgOEO2Q== X-Received: by 10.99.56.94 with SMTP id h30mr5328859pgn.23.1485463282916; Thu, 26 Jan 2017 12:41:22 -0800 (PST) Received: from ?IPv6:2620:0:1000:1704:84b:7df1:bc16:f35b? ([2620:0:1000:1704:84b:7df1:bc16:f35b]) by smtp.googlemail.com with ESMTPSA id f3sm5438310pga.34.2017.01.26.12.41.21 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 26 Jan 2017 12:41:21 -0800 (PST) Message-ID: <1485463281.5145.164.camel@edumazet-glaptop3.roam.corp.google.com> From: Eric Dumazet To: Hans-Kristian Bakke Cc: David Lang , bloat Date: Thu, 26 Jan 2017 12:41:21 -0800 In-Reply-To: References: <1485458323.5145.151.camel@edumazet-glaptop3.roam.corp.google.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.10.4-0ubuntu2 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: Re: [Bloat] Excessive throttling with fq X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Jan 2017 20:41:24 -0000 Can you post : ethtool -i eth0 ethtool -k eth0 grep HZ /boot/config.... (what is the HZ value of your kernel) I suspect a possible problem with TSO autodefer when/if HZ < 1000 Thanks. On Thu, 2017-01-26 at 21:19 +0100, Hans-Kristian Bakke wrote: > There are two packet captures from fq with and without pacing here: > > > https://owncloud.proikt.com/index.php/s/KuXIl8h8bSFH1fM > > > > The server (with fq pacing/nopacing) is 10.0.5.10 and is running a > Apache2 webserver at port tcp port 443. The tcp client is nginx > reverse proxy at 10.0.5.13 on the same subnet which again is proxying > the connection from the Windows 10 client. > - I did try to connect directly to the server with the client (via a > linux gateway router) avoiding the nginx proxy and just using plain > no-ssl http. That did not change anything. > - I also tried stopping the eth0 interface to force the traffic to the > eth1 interface in the LACP which changed nothing. > - I also pulled each of the cable on the switch to force the traffic > to switch between interfaces in the LACP link between the client > switch and the server switch. > > > The CPU is a 5-6 year old Intel Xeon X3430 CPU @ 4x2.40GHz on a > SuperMicro platform. It is not very loaded and the results are always > in the same ballpark with fq pacing on. > > > > top - 21:12:38 up 12 days, 11:08, 4 users, load average: 0.56, 0.68, > 0.77 > Tasks: 1344 total, 1 running, 1343 sleeping, 0 stopped, 0 zombie > %Cpu0 : 0.0 us, 1.0 sy, 0.0 ni, 99.0 id, 0.0 wa, 0.0 hi, 0.0 > si, 0.0 st > %Cpu1 : 0.0 us, 0.3 sy, 0.0 ni, 97.4 id, 2.0 wa, 0.0 hi, 0.3 > si, 0.0 st > %Cpu2 : 0.0 us, 2.0 sy, 0.0 ni, 96.4 id, 1.3 wa, 0.0 hi, 0.3 > si, 0.0 st > %Cpu3 : 0.7 us, 2.3 sy, 0.0 ni, 94.1 id, 3.0 wa, 0.0 hi, 0.0 > si, 0.0 st > KiB Mem : 16427572 total, 173712 free, 9739976 used, 6513884 > buff/cache > KiB Swap: 6369276 total, 6126736 free, 242540 used. 6224836 avail > Mem > > > This seems OK to me. It does have 24 drives in 3 ZFS pools at 144TB > raw storage in total with several SAS HBAs that is pretty much always > poking the system in some way or the other. > > > There are around 32K interrupts when running @23 MB/s (as seen in > chrome downloads) with pacing on and about 25K interrupts when running > @105 MB/s with fq nopacing. Is that normal? > > > Hans-Kristian > > > > On 26 January 2017 at 20:58, David Lang wrote: > Is there any CPU bottleneck? > > pacing causing this sort of problem makes me thing that the > CPU either can't keep up or that something (Hz setting type of > thing) is delaying when the CPU can get used. > > It's not clear from the posts if the problem is with sending > data or receiving data. > > David Lang > > > On Thu, 26 Jan 2017, Eric Dumazet wrote: > > Nothing jumps on my head. > > We use FQ on links varying from 1Gbit to 100Gbit, and > we have no such > issues. > > You could probably check on the server the TCP various > infos given by ss > command > > > ss -temoi dst > > > pacing rate is shown. You might have some issues, but > it is hard to say. > > > On Thu, 2017-01-26 at 19:55 +0100, Hans-Kristian Bakke > wrote: > After some more testing I see that if I > disable fq pacing the > performance is restored to the expected > levels: # for i in eth0 eth1; do tc qdisc > replace dev $i root fq nopacing; > done > > > Is this expected behaviour? There is some > background traffic, but only > in the sub 100 mbit/s on the switches and > gateway between the server > and client. > > > The chain: > Windows 10 client -> 1000 mbit/s -> switch -> > 2xgigabit LACP -> switch > -> 4 x gigabit LACP -> gw (fq_codel on all > nics) -> 4 x gigabit LACP > (the same as in) -> switch -> 2 x lacp -> > server (with misbehaving fq > pacing) > > > > On 26 January 2017 at 19:38, Hans-Kristian > Bakke > wrote: > I can add that this is without BBR, > just plain old kernel 4.8 > cubic. > > On 26 January 2017 at 19:36, > Hans-Kristian Bakke > wrote: > Another day, another fq issue > (or user error). > > > I try to do the seeminlig > simple task of downloading a > single large file over local > gigabit LAN from a > physical server running kernel > 4.8 and sch_fq on intel > server NICs. > > > For some reason it wouldn't go > past around 25 MB/s. > After having replaced SSL with > no SSL, replaced apache > with nginx and verified that > there is plenty of > bandwith available between my > client and the server I > tried to change qdisc from fq > to pfifo_fast. It > instantly shot up to around > the expected 85-90 MB/s. > The same happened with > fq_codel in place of fq. > > > I then checked the statistics > for fq and the throttled > counter is increasing > massively every second (eth0 and > eth1 is LACPed using Linux > bonding so both is seen > here): > > > qdisc fq 8007: root refcnt 2 > limit 10000p flow_limit > 100p buckets 1024 orphan_mask > 1023 quantum 3028 > initial_quantum 15140 > refill_delay 40.0ms > Sent 787131797 bytes 520082 > pkt (dropped 15, > overlimits 0 requeues 0) > backlog 98410b 65p requeues 0 > 15 flows (14 inactive, 1 > throttled) > 0 gc, 2 highprio, 259920 > throttled, 15 flows_plimit > qdisc fq 8008: root refcnt 2 > limit 10000p flow_limit > 100p buckets 1024 orphan_mask > 1023 quantum 3028 > initial_quantum 15140 > refill_delay 40.0ms > Sent 2533167 bytes 6731 pkt > (dropped 0, overlimits 0 > requeues 0) > backlog 0b 0p requeues 0 > 24 flows (24 inactive, 0 > throttled) > 0 gc, 2 highprio, 397 > throttled > > > Do you have any suggestions? > > > Regards, > Hans-Kristian > > > > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat > > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat > >