From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-x243.google.com (mail-pf0-x243.google.com [IPv6:2607:f8b0:400e:c00::243]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id DD3073B2A4 for ; Thu, 26 Jan 2017 16:33:41 -0500 (EST) Received: by mail-pf0-x243.google.com with SMTP id y143so17165214pfb.1 for ; Thu, 26 Jan 2017 13:33:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=message-id:subject:from:to:cc:date:in-reply-to:references :mime-version:content-transfer-encoding; bh=0Y9UgOlH6OthX4D348/NBp1dfQsEiCClrEP3OPdutN0=; b=e9NXrAxhYYbyFr8ZTiQnYdWK/xrxEnsd2K6oPI39wMPb0ceeOPUuneSVGzU9z27Hwt 1XcOVNabYrngljT27L2BoEKUUvk5729K83ucfPwI8i/VeiR8PJmon8utWKrwUmj+gBsD ULzk2rvWQyVLObITTszWsM7+Nl8cEjCMi9zUHbQjyvFPLBSH1doApUlM7mMnDFpxBZdT g5mvStuWo1dCGlwdXoT+sukYrkUfnZa21fEu+jAHqYvby030+nXjrjdRxFi6g2KVR4lN QirGyQltYRvk3PzrLblbh83OdgIzNnY0lzTcdvK5lbc3mkZq++8MWAhCzRKMXPvKHS9u vQzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:mime-version:content-transfer-encoding; bh=0Y9UgOlH6OthX4D348/NBp1dfQsEiCClrEP3OPdutN0=; b=DqipbW2ZHpzfas6V477yERuupVCF4QFw6ZajxSwiWY5gQ3recZ2uc1GPZpDV5/Umej gq7SetDAVq56sMrPf3i2wZTqT6rywlNRbKqEpp4rsdiRbicB0q4hK1ReuINIeM5x3fC4 0fKr7dvB9hUAwNXJh7ET7f3Vx6sfe1AiU3jls74FnHVjlpPBWeR7H+TlsYUrML7MlbUF +DnsVlSmW4CIFGQQ28C6Mkz86ZTYBQcgihDrcXg2QupCAaRBRuk/bL2N//fawS9pFTfN WWgBPzdkZfUsNYkStjcMi/s7kba6GWW4snb6BQrbJKubIkoHmATyEK6xM2rjh4pKOd7C Lu7w== X-Gm-Message-State: AIkVDXKXOiVVjn4272US2Js9FzLOLSl0oaLJYpjr75yKVUIFpFVqfE64ahBAlp7E0E7Dcg== X-Received: by 10.84.253.23 with SMTP id z23mr7295240pll.0.1485466420844; Thu, 26 Jan 2017 13:33:40 -0800 (PST) Received: from ?IPv6:2620:0:1000:1704:84b:7df1:bc16:f35b? ([2620:0:1000:1704:84b:7df1:bc16:f35b]) by smtp.googlemail.com with ESMTPSA id m20sm5582986pgd.17.2017.01.26.13.33.39 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 26 Jan 2017 13:33:40 -0800 (PST) Message-ID: <1485466419.5145.177.camel@edumazet-glaptop3.roam.corp.google.com> From: Eric Dumazet To: Hans-Kristian Bakke Cc: David Lang , bloat Date: Thu, 26 Jan 2017 13:33:39 -0800 In-Reply-To: References: <1485458323.5145.151.camel@edumazet-glaptop3.roam.corp.google.com> <1485463281.5145.164.camel@edumazet-glaptop3.roam.corp.google.com> <1485464452.5145.172.camel@edumazet-glaptop3.roam.corp.google.com> <1485464834.5145.173.camel@edumazet-glaptop3.roam.corp.google.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.10.4-0ubuntu2 Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: Re: [Bloat] Excessive throttling with fq X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Jan 2017 21:33:42 -0000 On Thu, 2017-01-26 at 22:20 +0100, Hans-Kristian Bakke wrote: > Wow, that was it (after seeing your previous mail I disabled and > reenabled tso and gso on all eth0, eth1 AND bond0 to reset all to the > same state and it cleared up all the issues. > > > In other words my issue was that my physical nics eth0 and eth1 had > gso/tso enabled but my bond0 interface gso/tso disabled which > everything else but fq with pacing did not seem to care about. > > > The reason why is probably from my traffic shaper script in previous > experiments the last couple of days. > I actually think my gateway may have the same latent issue for fq with > pacing as my HTB + fq_codel WAN traffic shaper script is automatically > disabling tso and gso on the shaped interface, which in my case is > bond0.12 (bonding AND VLANs) while the underlying physical interfaces > still have tso and gso enabled as the script does not know that the > interface happens to be bound to one or more layers of interfaces > below that. > > > > > This is the difference in the ethtool -k output between the > non-working fq pacing settings and the working version. > > > diff ethtool_k_bond0.txt ethtool_k_bond0-2.txt > 13c13 > < tx-tcp-segmentation: off > --- > > tx-tcp-segmentation: on > 15c15 > < tx-tcp-mangleid-segmentation: off [requested on] > --- > > tx-tcp-mangleid-segmentation: on > > > Thank you for pointing me in the right direction! I don't know is this > is a "wont-fix" issue because of non-logical user configuration or if > it should be looked into to be handled better in the future. > > non TSO devices are supported, but we would generally install FQ on the bonding device, and let TSO enabled on the bonding. This is because setting timers is expensive, and our design choices for pacing tried hard to avoid setting timers for every 2 packets sent (as in 1-MSS packets) ;) ( https://lwn.net/Articles/564978/ ) Of course this does not really matter for slow links (Like 10Mbit or 100Mbit NIC) > > > On 26 January 2017 at 22:07, Eric Dumazet > wrote: > On Thu, 2017-01-26 at 22:02 +0100, Hans-Kristian Bakke wrote: > > It seems like it is not: > > > > It really should ;) > > This is normally the default. Do you know why it is off ? > > ethtool -K bond0 tso on > > > > > > Features for bond0: > > rx-checksumming: off [fixed] > > tx-checksumming: on > > tx-checksum-ipv4: off [fixed] > > tx-checksum-ip-generic: on > > tx-checksum-ipv6: off [fixed] > > tx-checksum-fcoe-crc: off [fixed] > > tx-checksum-sctp: off [fixed] > > scatter-gather: on > > tx-scatter-gather: on > > tx-scatter-gather-fraglist: off [requested on] > > tcp-segmentation-offload: on > > tx-tcp-segmentation: off > > tx-tcp-ecn-segmentation: on > > tx-tcp-mangleid-segmentation: off [requested on] > > tx-tcp6-segmentation: on > > udp-fragmentation-offload: off [fixed] > > generic-segmentation-offload: on > > generic-receive-offload: on > > large-receive-offload: off > > rx-vlan-offload: on > > tx-vlan-offload: on > > ntuple-filters: off [fixed] > > receive-hashing: off [fixed] > > highdma: on > > rx-vlan-filter: on > > vlan-challenged: off [fixed] > > tx-lockless: on [fixed] > > netns-local: on [fixed] > > tx-gso-robust: off [fixed] > > tx-fcoe-segmentation: off [fixed] > > tx-gre-segmentation: on > > tx-gre-csum-segmentation: on > > tx-ipxip4-segmentation: on > > tx-ipxip6-segmentation: on > > tx-udp_tnl-segmentation: on > > tx-udp_tnl-csum-segmentation: on > > tx-gso-partial: off [fixed] > > tx-sctp-segmentation: off [fixed] > > fcoe-mtu: off [fixed] > > tx-nocache-copy: off > > loopback: off [fixed] > > rx-fcs: off [fixed] > > rx-all: off [fixed] > > tx-vlan-stag-hw-insert: off [fixed] > > rx-vlan-stag-hw-parse: off [fixed] > > rx-vlan-stag-filter: off [fixed] > > l2-fwd-offload: off [fixed] > > busy-poll: off [fixed] > > hw-tc-offload: off [fixed] > > > > > > > > On 26 January 2017 at 22:00, Eric Dumazet > > > wrote: > > For some reason, even though this NIC advertises TSO > support, > > tcpdump clearly shows TSO is not used at all. > > > > Oh wait, maybe TSO is not enabled on the bonding > device ? > > > > On Thu, 2017-01-26 at 21:46 +0100, Hans-Kristian > Bakke wrote: > > > # ethtool -i eth0 > > > driver: e1000e > > > version: 3.2.6-k > > > firmware-version: 1.9-0 > > > expansion-rom-version: > > > bus-info: 0000:04:00.0 > > > supports-statistics: yes > > > supports-test: yes > > > supports-eeprom-access: yes > > > supports-register-dump: yes > > > supports-priv-flags: no > > > > > > > > > # ethtool -k eth0 > > > Features for eth0: > > > rx-checksumming: on > > > tx-checksumming: on > > > tx-checksum-ipv4: off [fixed] > > > tx-checksum-ip-generic: on > > > tx-checksum-ipv6: off [fixed] > > > tx-checksum-fcoe-crc: off [fixed] > > > tx-checksum-sctp: off [fixed] > > > scatter-gather: on > > > tx-scatter-gather: on > > > tx-scatter-gather-fraglist: off [fixed] > > > tcp-segmentation-offload: on > > > tx-tcp-segmentation: on > > > tx-tcp-ecn-segmentation: off [fixed] > > > tx-tcp-mangleid-segmentation: on > > > tx-tcp6-segmentation: on > > > udp-fragmentation-offload: off [fixed] > > > generic-segmentation-offload: on > > > generic-receive-offload: on > > > large-receive-offload: off [fixed] > > > rx-vlan-offload: on > > > tx-vlan-offload: on > > > ntuple-filters: off [fixed] > > > receive-hashing: on > > > highdma: on [fixed] > > > rx-vlan-filter: on [fixed] > > > vlan-challenged: off [fixed] > > > tx-lockless: off [fixed] > > > netns-local: off [fixed] > > > tx-gso-robust: off [fixed] > > > tx-fcoe-segmentation: off [fixed] > > > tx-gre-segmentation: off [fixed] > > > tx-gre-csum-segmentation: off [fixed] > > > tx-ipxip4-segmentation: off [fixed] > > > tx-ipxip6-segmentation: off [fixed] > > > tx-udp_tnl-segmentation: off [fixed] > > > tx-udp_tnl-csum-segmentation: off [fixed] > > > tx-gso-partial: off [fixed] > > > tx-sctp-segmentation: off [fixed] > > > fcoe-mtu: off [fixed] > > > tx-nocache-copy: off > > > loopback: off [fixed] > > > rx-fcs: off > > > rx-all: off > > > tx-vlan-stag-hw-insert: off [fixed] > > > rx-vlan-stag-hw-parse: off [fixed] > > > rx-vlan-stag-filter: off [fixed] > > > l2-fwd-offload: off [fixed] > > > busy-poll: off [fixed] > > > hw-tc-offload: off [fixed] > > > > > > > > > # grep HZ /boot/config-4.8.0-2-amd64 > > > CONFIG_NO_HZ_COMMON=y > > > # CONFIG_HZ_PERIODIC is not set > > > CONFIG_NO_HZ_IDLE=y > > > # CONFIG_NO_HZ_FULL is not set > > > # CONFIG_NO_HZ is not set > > > # CONFIG_HZ_100 is not set > > > CONFIG_HZ_250=y > > > # CONFIG_HZ_300 is not set > > > # CONFIG_HZ_1000 is not set > > > CONFIG_HZ=250 > > > CONFIG_MACHZ_WDT=m > > > > > > > > > > > > On 26 January 2017 at 21:41, Eric Dumazet > > > > > wrote: > > > > > > Can you post : > > > > > > ethtool -i eth0 > > > ethtool -k eth0 > > > > > > grep HZ /boot/config.... (what is the HZ > value of > > your kernel) > > > > > > I suspect a possible problem with TSO > autodefer > > when/if HZ < > > > 1000 > > > > > > Thanks. > > > > > > On Thu, 2017-01-26 at 21:19 +0100, > Hans-Kristian > > Bakke wrote: > > > > There are two packet captures from fq > with and > > without > > > pacing here: > > > > > > > > > > > > > > > https://owncloud.proikt.com/index.php/s/KuXIl8h8bSFH1fM > > > > > > > > > > > > > > > > The server (with fq pacing/nopacing) is > 10.0.5.10 > > and is > > > running a > > > > Apache2 webserver at port tcp port 443. > The tcp > > client is > > > nginx > > > > reverse proxy at 10.0.5.13 on the same > subnet > > which again is > > > proxying > > > > the connection from the Windows 10 > client. > > > > - I did try to connect directly to the > server with > > the > > > client (via a > > > > linux gateway router) avoiding the nginx > proxy and > > just > > > using plain > > > > no-ssl http. That did not change > anything. > > > > - I also tried stopping the eth0 > interface to > > force the > > > traffic to the > > > > eth1 interface in the LACP which changed > nothing. > > > > - I also pulled each of the cable on the > switch to > > force the > > > traffic > > > > to switch between interfaces in the LACP > link > > between the > > > client > > > > switch and the server switch. > > > > > > > > > > > > The CPU is a 5-6 year old Intel Xeon > X3430 CPU @ > > 4x2.40GHz > > > on a > > > > SuperMicro platform. It is not very > loaded and the > > results > > > are always > > > > in the same ballpark with fq pacing on. > > > > > > > > > > > > > > > > top - 21:12:38 up 12 days, 11:08, 4 > users, load > > average: > > > 0.56, 0.68, > > > > 0.77 > > > > Tasks: 1344 total, 1 running, 1343 > sleeping, 0 > > > stopped, 0 zombie > > > > %Cpu0 : 0.0 us, 1.0 sy, 0.0 ni, 99.0 > id, 0.0 > > wa, 0.0 > > > hi, 0.0 > > > > si, 0.0 st > > > > %Cpu1 : 0.0 us, 0.3 sy, 0.0 ni, 97.4 > id, 2.0 > > wa, 0.0 > > > hi, 0.3 > > > > si, 0.0 st > > > > %Cpu2 : 0.0 us, 2.0 sy, 0.0 ni, 96.4 > id, 1.3 > > wa, 0.0 > > > hi, 0.3 > > > > si, 0.0 st > > > > %Cpu3 : 0.7 us, 2.3 sy, 0.0 ni, 94.1 > id, 3.0 > > wa, 0.0 > > > hi, 0.0 > > > > si, 0.0 st > > > > KiB Mem : 16427572 total, 173712 free, > 9739976 > > used, > > > 6513884 > > > > buff/cache > > > > KiB Swap: 6369276 total, 6126736 > free, 242540 > > used. > > > 6224836 avail > > > > Mem > > > > > > > > > > > > This seems OK to me. It does have 24 > drives in 3 > > ZFS pools > > > at 144TB > > > > raw storage in total with several SAS > HBAs that is > > pretty > > > much always > > > > poking the system in some way or the > other. > > > > > > > > > > > > There are around 32K interrupts when > running @23 > > MB/s (as > > > seen in > > > > chrome downloads) with pacing on and > about 25K > > interrupts > > > when running > > > > @105 MB/s with fq nopacing. Is that > normal? > > > > > > > > > > > > Hans-Kristian > > > > > > > > > > > > > > > > On 26 January 2017 at 20:58, David Lang > > > > > wrote: > > > > Is there any CPU bottleneck? > > > > > > > > pacing causing this sort of > problem makes > > me thing > > > that the > > > > CPU either can't keep up or that > something > > (Hz > > > setting type of > > > > thing) is delaying when the CPU > can get > > used. > > > > > > > > It's not clear from the posts if > the > > problem is with > > > sending > > > > data or receiving data. > > > > > > > > David Lang > > > > > > > > > > > > On Thu, 26 Jan 2017, Eric > Dumazet wrote: > > > > > > > > Nothing jumps on my > head. > > > > > > > > We use FQ on links > varying from > > 1Gbit to > > > 100Gbit, and > > > > we have no such > > > > issues. > > > > > > > > You could probably check > on the > > server the > > > TCP various > > > > infos given by ss > > > > command > > > > > > > > > > > > ss -temoi dst > > > > > > > > > > > > pacing rate is shown. > You might > > have some > > > issues, but > > > > it is hard to say. > > > > > > > > > > > > On Thu, 2017-01-26 at > 19:55 +0100, > > > Hans-Kristian Bakke > > > > wrote: > > > > After some more > testing I > > see that > > > if I > > > > disable fq > pacing the > > > > performance is > restored to > > the > > > expected > > > > levels: # for i > in eth0 > > eth1; do tc > > > qdisc > > > > replace dev $i > root fq > > nopacing; > > > > done > > > > > > > > > > > > Is this expected > > behaviour? There is > > > some > > > > background > traffic, but > > only > > > > in the sub 100 > mbit/s on > > the > > > switches and > > > > gateway between > the server > > > > and client. > > > > > > > > > > > > The chain: > > > > Windows 10 > client -> 1000 > > mbit/s -> > > > switch -> > > > > 2xgigabit LACP > -> switch > > > > -> 4 x gigabit > LACP -> gw > > (fq_codel > > > on all > > > > nics) -> 4 x > gigabit LACP > > > > (the same as in) > -> switch > > -> 2 x > > > lacp -> > > > > server (with > misbehaving > > fq > > > > pacing) > > > > > > > > > > > > > > > > On 26 January > 2017 at > > 19:38, > > > Hans-Kristian > > > > Bakke > > > > > wrote: > > > > I can > add that > > this is > > > without BBR, > > > > just plain old > kernel 4.8 > > > > cubic. > > > > > > > > On 26 > January 2017 > > at 19:36, > > > > Hans-Kristian > Bakke > > > > > > wrote: > > > > > Another > > day, another > > > fq issue > > > > (or user error). > > > > > > > > > > > > > I try to > > do the > > > seeminlig > > > > simple task of > downloading > > a > > > > > single > > large file > > > over local > > > > gigabit LAN > from a > > > > > physical > > server > > > running kernel > > > > 4.8 and sch_fq > on intel > > > > > server > > NICs. > > > > > > > > > > > > > For some > > reason it > > > wouldn't go > > > > past around 25 > MB/s. > > > > > After > > having > > > replaced SSL with > > > > no SSL, replaced > apache > > > > > with nginx > > and > > > verified that > > > > there is plenty > of > > > > > bandwith > > available > > > between my > > > > client and the > server I > > > > > tried to > > change > > > qdisc from fq > > > > to pfifo_fast. > It > > > > > instantly > > shot up to > > > around > > > > the expected > 85-90 MB/s. > > > > > The same > > happened > > > with > > > > fq_codel in > place of fq. > > > > > > > > > > > > > I then > > checked the > > > statistics > > > > for fq and the > throttled > > > > > counter is > > > increasing > > > > massively every > second > > (eth0 and > > > > > eth1 is > > LACPed using > > > Linux > > > > bonding so both > is seen > > > > > here): > > > > > > > > > > > > > qdisc fq > > 8007: root > > > refcnt 2 > > > > limit 10000p > flow_limit > > > > > 100p > > buckets 1024 > > > orphan_mask > > > > 1023 quantum > 3028 > > > > > > initial_quantum > > > 15140 > > > > refill_delay > 40.0ms > > > > > Sent > > 787131797 > > > bytes 520082 > > > > pkt (dropped 15, > > > > > overlimits > > 0 > > > requeues 0) > > > > > backlog > > 98410b 65p > > > requeues 0 > > > > > 15 flows > > (14 > > > inactive, 1 > > > > throttled) > > > > > 0 gc, 2 > > highprio, > > > 259920 > > > > throttled, 15 > flows_plimit > > > > > qdisc fq > > 8008: root > > > refcnt 2 > > > > limit 10000p > flow_limit > > > > > 100p > > buckets 1024 > > > orphan_mask > > > > 1023 quantum > 3028 > > > > > > initial_quantum > > > 15140 > > > > refill_delay > 40.0ms > > > > > Sent > > 2533167 bytes > > > 6731 pkt > > > > (dropped 0, > overlimits 0 > > > > > requeues > > 0) > > > > > backlog > > 0b 0p > > > requeues 0 > > > > > 24 flows > > (24 > > > inactive, 0 > > > > throttled) > > > > > 0 gc, 2 > > highprio, > > > 397 > > > > throttled > > > > > > > > > > > > > Do you > > have any > > > suggestions? > > > > > > > > > > > > > Regards, > > > > > > Hans-Kristian > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > Bloat mailing > list > > > > > > Bloat@lists.bufferbloat.net > > > > > > > > https://lists.bufferbloat.net/listinfo/bloat > > > > > > > > > > > > > > > > _______________________________________________ > > > > Bloat mailing list > > > > > Bloat@lists.bufferbloat.net > > > > > > https://lists.bufferbloat.net/listinfo/bloat > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >