From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <eric.dumazet@gmail.com>
Received: from mail-pg0-x242.google.com (mail-pg0-x242.google.com
 [IPv6:2607:f8b0:400e:c05::242])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id 3A88A3B2A4
 for <bloat@lists.bufferbloat.net>; Thu, 26 Jan 2017 16:07:16 -0500 (EST)
Received: by mail-pg0-x242.google.com with SMTP id 75so23203894pgf.3
 for <bloat@lists.bufferbloat.net>; Thu, 26 Jan 2017 13:07:16 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=message-id:subject:from:to:cc:date:in-reply-to:references
 :mime-version:content-transfer-encoding;
 bh=2STQXOwHTw8GClyojf0BYTrPfHlCGkDZHorUUYXKqnc=;
 b=iBBW9PeWOuUVvphaktUQ0BYfhxI2kllCTnFdKe5bVLwxtZfpZwGF7GqZLlVMLticxR
 ifeaBMOGJQkL1bxCb1zcK+6mT71Ye3WgPgS4CEvPLNxiXpwOGhMwn5xPetpvOoMM9Ko/
 uzwk23toFj/R316P8eOqCAS5gAflA/4SqxRGWn3GujH4fW6+RQnjC4ZtxPZydXFvJkmQ
 ypdl4hTHnD8HFLHAb54Xfh3HrNLV+MRe9H5f8caZe7a/n+8asx/gx5oCtyIuxFyVTUdD
 dB/5no8bh7Vic4hMHzlY+w8igqZQjx+s5tmLtf/c2Rhh67sxvnCbwcKS618JqT4AOu9P
 nOeQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to
 :references:mime-version:content-transfer-encoding;
 bh=2STQXOwHTw8GClyojf0BYTrPfHlCGkDZHorUUYXKqnc=;
 b=ivDy1y9nthUxk3MzAJU3YVl0UVmNtw4tvVrIt2m+NPB7sP5PLa3jqCt3GNgSGWbFmX
 4VJQz1yv9MyyzcfH0l01aFNRuAu4mdWOi8E+uai2L/8uVx1EF5Qec59MQ6sJYxquAS9P
 fq9f8R7jzDox8FIrOZVyNWBtyMXUz40c6ECjVlL3Lexd9KN+ustPaltTiZoHRGo9am/P
 DIcMkTvS3tO+147lgUVDtknljsbTm7w65XFOujnsaieS/AGvJ4+78IlUPV1Xdq+K+tQc
 Q1vimF/lvmIGQvdUxev5y6V7AVgFJOy70yxAPUu6scjeRt8XDgFJM4lWw83t9Al8RcDn
 PrPQ==
X-Gm-Message-State: AIkVDXLN75mcG4Is/tCbqnLZ8f5oISCImnK2VI2/LlDAK9NZZJ9qfRA9pDR3S+7FbedqpQ==
X-Received: by 10.84.232.141 with SMTP id i13mr6948423plk.119.1485464835242;
 Thu, 26 Jan 2017 13:07:15 -0800 (PST)
Received: from ?IPv6:2620:0:1000:1704:84b:7df1:bc16:f35b?
 ([2620:0:1000:1704:84b:7df1:bc16:f35b])
 by smtp.googlemail.com with ESMTPSA id p14sm5542146pfl.75.2017.01.26.13.07.14
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Thu, 26 Jan 2017 13:07:14 -0800 (PST)
Message-ID: <1485464834.5145.173.camel@edumazet-glaptop3.roam.corp.google.com>
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Hans-Kristian Bakke <hkbakke@gmail.com>
Cc: David Lang <david@lang.hm>, bloat <bloat@lists.bufferbloat.net>
Date: Thu, 26 Jan 2017 13:07:14 -0800
In-Reply-To: <CAD_cGvFXR+Qb9_gnp=k4UttJZnrRRm4i19of7D4v8MK9EjeZ6Q@mail.gmail.com>
References: <CAD_cGvG-CvWL9tZ-w8NuctLV=CvahXh0Svkpydk_GmGPcm0wHg@mail.gmail.com>
 <CAD_cGvHhmBDjt4gFYRHBELyAKEXeH4AycRFjH4bcWCC7RAbacw@mail.gmail.com>
 <CAD_cGvFmNm-EnVztd2i2Q8dxeV+Av_mPO6sLte8KqiMgQgBU=Q@mail.gmail.com>
 <1485458323.5145.151.camel@edumazet-glaptop3.roam.corp.google.com>
 <nycvar.QRO.7.75.62.1701261154110.6590@qynat-yncgbc>
 <CAD_cGvFfG-wK6VgG7+2XPXRhnt1x1obRcfs+qzShViZ5K+O1ag@mail.gmail.com>
 <1485463281.5145.164.camel@edumazet-glaptop3.roam.corp.google.com>
 <CAD_cGvGC_Xy4ztKV04R2SeU=YXntqZyRC2HiXm88hfp9L9i7Kg@mail.gmail.com>
 <1485464452.5145.172.camel@edumazet-glaptop3.roam.corp.google.com>
 <CAD_cGvFXR+Qb9_gnp=k4UttJZnrRRm4i19of7D4v8MK9EjeZ6Q@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"
X-Mailer: Evolution 3.10.4-0ubuntu2 
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Subject: Re: [Bloat] Excessive throttling with fq
X-BeenThere: bloat@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: General list for discussing Bufferbloat <bloat.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/bloat>,
 <mailto:bloat-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/bloat>
List-Post: <mailto:bloat@lists.bufferbloat.net>
List-Help: <mailto:bloat-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/bloat>,
 <mailto:bloat-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Thu, 26 Jan 2017 21:07:16 -0000

On Thu, 2017-01-26 at 22:02 +0100, Hans-Kristian Bakke wrote:
> It seems like it is not:
> 

It really should ;)

This is normally the default. Do you know why it is off ?

ethtool -K bond0 tso on


> 
> Features for bond0:
> rx-checksumming: off [fixed]
> tx-checksumming: on
> tx-checksum-ipv4: off [fixed]
> tx-checksum-ip-generic: on
> tx-checksum-ipv6: off [fixed]
> tx-checksum-fcoe-crc: off [fixed]
> tx-checksum-sctp: off [fixed]
> scatter-gather: on
> tx-scatter-gather: on
> tx-scatter-gather-fraglist: off [requested on]
> tcp-segmentation-offload: on
> tx-tcp-segmentation: off
> tx-tcp-ecn-segmentation: on
> tx-tcp-mangleid-segmentation: off [requested on]
> tx-tcp6-segmentation: on
> udp-fragmentation-offload: off [fixed]
> generic-segmentation-offload: on
> generic-receive-offload: on
> large-receive-offload: off
> rx-vlan-offload: on
> tx-vlan-offload: on
> ntuple-filters: off [fixed]
> receive-hashing: off [fixed]
> highdma: on
> rx-vlan-filter: on
> vlan-challenged: off [fixed]
> tx-lockless: on [fixed]
> netns-local: on [fixed]
> tx-gso-robust: off [fixed]
> tx-fcoe-segmentation: off [fixed]
> tx-gre-segmentation: on
> tx-gre-csum-segmentation: on
> tx-ipxip4-segmentation: on
> tx-ipxip6-segmentation: on
> tx-udp_tnl-segmentation: on
> tx-udp_tnl-csum-segmentation: on
> tx-gso-partial: off [fixed]
> tx-sctp-segmentation: off [fixed]
> fcoe-mtu: off [fixed]
> tx-nocache-copy: off
> loopback: off [fixed]
> rx-fcs: off [fixed]
> rx-all: off [fixed]
> tx-vlan-stag-hw-insert: off [fixed]
> rx-vlan-stag-hw-parse: off [fixed]
> rx-vlan-stag-filter: off [fixed]
> l2-fwd-offload: off [fixed]
> busy-poll: off [fixed]
> hw-tc-offload: off [fixed]
> 
> 
> 
> On 26 January 2017 at 22:00, Eric Dumazet <eric.dumazet@gmail.com>
> wrote:
>         For some reason, even though this NIC advertises TSO support,
>         tcpdump clearly shows TSO is not used at all.
>         
>         Oh wait, maybe TSO is not enabled on the bonding device ?
>         
>         On Thu, 2017-01-26 at 21:46 +0100, Hans-Kristian Bakke wrote:
>         > # ethtool -i eth0
>         > driver: e1000e
>         > version: 3.2.6-k
>         > firmware-version: 1.9-0
>         > expansion-rom-version:
>         > bus-info: 0000:04:00.0
>         > supports-statistics: yes
>         > supports-test: yes
>         > supports-eeprom-access: yes
>         > supports-register-dump: yes
>         > supports-priv-flags: no
>         >
>         >
>         > # ethtool -k eth0
>         > Features for eth0:
>         > rx-checksumming: on
>         > tx-checksumming: on
>         > tx-checksum-ipv4: off [fixed]
>         > tx-checksum-ip-generic: on
>         > tx-checksum-ipv6: off [fixed]
>         > tx-checksum-fcoe-crc: off [fixed]
>         > tx-checksum-sctp: off [fixed]
>         > scatter-gather: on
>         > tx-scatter-gather: on
>         > tx-scatter-gather-fraglist: off [fixed]
>         > tcp-segmentation-offload: on
>         > tx-tcp-segmentation: on
>         > tx-tcp-ecn-segmentation: off [fixed]
>         > tx-tcp-mangleid-segmentation: on
>         > tx-tcp6-segmentation: on
>         > udp-fragmentation-offload: off [fixed]
>         > generic-segmentation-offload: on
>         > generic-receive-offload: on
>         > large-receive-offload: off [fixed]
>         > rx-vlan-offload: on
>         > tx-vlan-offload: on
>         > ntuple-filters: off [fixed]
>         > receive-hashing: on
>         > highdma: on [fixed]
>         > rx-vlan-filter: on [fixed]
>         > vlan-challenged: off [fixed]
>         > tx-lockless: off [fixed]
>         > netns-local: off [fixed]
>         > tx-gso-robust: off [fixed]
>         > tx-fcoe-segmentation: off [fixed]
>         > tx-gre-segmentation: off [fixed]
>         > tx-gre-csum-segmentation: off [fixed]
>         > tx-ipxip4-segmentation: off [fixed]
>         > tx-ipxip6-segmentation: off [fixed]
>         > tx-udp_tnl-segmentation: off [fixed]
>         > tx-udp_tnl-csum-segmentation: off [fixed]
>         > tx-gso-partial: off [fixed]
>         > tx-sctp-segmentation: off [fixed]
>         > fcoe-mtu: off [fixed]
>         > tx-nocache-copy: off
>         > loopback: off [fixed]
>         > rx-fcs: off
>         > rx-all: off
>         > tx-vlan-stag-hw-insert: off [fixed]
>         > rx-vlan-stag-hw-parse: off [fixed]
>         > rx-vlan-stag-filter: off [fixed]
>         > l2-fwd-offload: off [fixed]
>         > busy-poll: off [fixed]
>         > hw-tc-offload: off [fixed]
>         >
>         >
>         > # grep HZ /boot/config-4.8.0-2-amd64
>         > CONFIG_NO_HZ_COMMON=y
>         > # CONFIG_HZ_PERIODIC is not set
>         > CONFIG_NO_HZ_IDLE=y
>         > # CONFIG_NO_HZ_FULL is not set
>         > # CONFIG_NO_HZ is not set
>         > # CONFIG_HZ_100 is not set
>         > CONFIG_HZ_250=y
>         > # CONFIG_HZ_300 is not set
>         > # CONFIG_HZ_1000 is not set
>         > CONFIG_HZ=250
>         > CONFIG_MACHZ_WDT=m
>         >
>         >
>         >
>         > On 26 January 2017 at 21:41, Eric Dumazet
>         <eric.dumazet@gmail.com>
>         > wrote:
>         >
>         >         Can you post :
>         >
>         >         ethtool -i eth0
>         >         ethtool -k eth0
>         >
>         >         grep HZ /boot/config.... (what is the HZ value of
>         your kernel)
>         >
>         >         I suspect a possible problem with TSO autodefer
>         when/if HZ <
>         >         1000
>         >
>         >         Thanks.
>         >
>         >         On Thu, 2017-01-26 at 21:19 +0100, Hans-Kristian
>         Bakke wrote:
>         >         > There are two packet captures from fq with and
>         without
>         >         pacing here:
>         >         >
>         >         >
>         >         >
>         https://owncloud.proikt.com/index.php/s/KuXIl8h8bSFH1fM
>         >         >
>         >         >
>         >         >
>         >         > The server (with fq pacing/nopacing) is 10.0.5.10
>         and is
>         >         running a
>         >         > Apache2 webserver at port tcp port 443. The tcp
>         client is
>         >         nginx
>         >         > reverse proxy at 10.0.5.13 on the same subnet
>         which again is
>         >         proxying
>         >         > the connection from the Windows 10 client.
>         >         > - I did try to connect directly to the server with
>         the
>         >         client (via a
>         >         > linux gateway router) avoiding the nginx proxy and
>         just
>         >         using plain
>         >         > no-ssl http. That did not change anything.
>         >         > - I also tried stopping the eth0 interface to
>         force the
>         >         traffic to the
>         >         > eth1 interface in the LACP which changed nothing.
>         >         > - I also pulled each of the cable on the switch to
>         force the
>         >         traffic
>         >         > to switch between interfaces in the LACP link
>         between the
>         >         client
>         >         > switch and the server switch.
>         >         >
>         >         >
>         >         > The CPU is a 5-6 year old Intel Xeon X3430 CPU @
>         4x2.40GHz
>         >         on a
>         >         > SuperMicro platform. It is not very loaded and the
>         results
>         >         are always
>         >         > in the same ballpark with fq pacing on.
>         >         >
>         >         >
>         >         >
>         >         > top - 21:12:38 up 12 days, 11:08,  4 users,  load
>         average:
>         >         0.56, 0.68,
>         >         > 0.77
>         >         > Tasks: 1344 total,   1 running, 1343 sleeping,   0
>         >         stopped,   0 zombie
>         >         > %Cpu0  :  0.0 us,  1.0 sy,  0.0 ni, 99.0 id,  0.0
>         wa,  0.0
>         >         hi,  0.0
>         >         > si,  0.0 st
>         >         > %Cpu1  :  0.0 us,  0.3 sy,  0.0 ni, 97.4 id,  2.0
>         wa,  0.0
>         >         hi,  0.3
>         >         > si,  0.0 st
>         >         > %Cpu2  :  0.0 us,  2.0 sy,  0.0 ni, 96.4 id,  1.3
>         wa,  0.0
>         >         hi,  0.3
>         >         > si,  0.0 st
>         >         > %Cpu3  :  0.7 us,  2.3 sy,  0.0 ni, 94.1 id,  3.0
>         wa,  0.0
>         >         hi,  0.0
>         >         > si,  0.0 st
>         >         > KiB Mem : 16427572 total,   173712 free,  9739976
>         used,
>         >         6513884
>         >         > buff/cache
>         >         > KiB Swap:  6369276 total,  6126736 free,   242540
>         used.
>         >         6224836 avail
>         >         > Mem
>         >         >
>         >         >
>         >         > This seems OK to me. It does have 24 drives in 3
>         ZFS pools
>         >         at 144TB
>         >         > raw storage in total with several SAS HBAs that is
>         pretty
>         >         much always
>         >         > poking the system in some way or the other.
>         >         >
>         >         >
>         >         > There are around 32K interrupts when running @23
>         MB/s (as
>         >         seen in
>         >         > chrome downloads) with pacing on and about 25K
>         interrupts
>         >         when running
>         >         > @105 MB/s with fq nopacing. Is that normal?
>         >         >
>         >         >
>         >         > Hans-Kristian
>         >         >
>         >         >
>         >         >
>         >         > On 26 January 2017 at 20:58, David Lang
>         <david@lang.hm>
>         >         wrote:
>         >         >         Is there any CPU bottleneck?
>         >         >
>         >         >         pacing causing this sort of problem makes
>         me thing
>         >         that the
>         >         >         CPU either can't keep up or that something
>         (Hz
>         >         setting type of
>         >         >         thing) is delaying when the CPU can get
>         used.
>         >         >
>         >         >         It's not clear from the posts if the
>         problem is with
>         >         sending
>         >         >         data or receiving data.
>         >         >
>         >         >         David Lang
>         >         >
>         >         >
>         >         >         On Thu, 26 Jan 2017, Eric Dumazet wrote:
>         >         >
>         >         >                 Nothing jumps on my head.
>         >         >
>         >         >                 We use FQ on links varying from
>         1Gbit to
>         >         100Gbit, and
>         >         >                 we have no such
>         >         >                 issues.
>         >         >
>         >         >                 You could probably check on the
>         server the
>         >         TCP various
>         >         >                 infos given by ss
>         >         >                 command
>         >         >
>         >         >
>         >         >                 ss -temoi dst <remoteip>
>         >         >
>         >         >
>         >         >                 pacing rate is shown. You might
>         have some
>         >         issues, but
>         >         >                 it is hard to say.
>         >         >
>         >         >
>         >         >                 On Thu, 2017-01-26 at 19:55 +0100,
>         >         Hans-Kristian Bakke
>         >         >                 wrote:
>         >         >                         After some more testing I
>         see that
>         >         if I
>         >         >                         disable fq pacing the
>         >         >                         performance is restored to
>         the
>         >         expected
>         >         >                         levels: # for i in eth0
>         eth1; do tc
>         >         qdisc
>         >         >                         replace dev $i root fq
>         nopacing;
>         >         >                         done
>         >         >
>         >         >
>         >         >                         Is this expected
>         behaviour? There is
>         >         some
>         >         >                         background traffic, but
>         only
>         >         >                         in the sub 100 mbit/s on
>         the
>         >         switches and
>         >         >                         gateway between the server
>         >         >                         and client.
>         >         >
>         >         >
>         >         >                         The chain:
>         >         >                         Windows 10 client -> 1000
>         mbit/s ->
>         >         switch ->
>         >         >                         2xgigabit LACP -> switch
>         >         >                         -> 4 x gigabit LACP -> gw
>         (fq_codel
>         >         on all
>         >         >                         nics) -> 4 x gigabit LACP
>         >         >                         (the same as in) -> switch
>         -> 2 x
>         >         lacp ->
>         >         >                         server (with misbehaving
>         fq
>         >         >                         pacing)
>         >         >
>         >         >
>         >         >
>         >         >                         On 26 January 2017 at
>         19:38,
>         >         Hans-Kristian
>         >         >                         Bakke <hkbakke@gmail.com>
>         >         >                         wrote:
>         >         >                                 I can add that
>         this is
>         >         without BBR,
>         >         >                         just plain old kernel 4.8
>         >         >                                 cubic.
>         >         >
>         >         >                                 On 26 January 2017
>         at 19:36,
>         >         >                         Hans-Kristian Bakke
>         >         >
>          <hkbakke@gmail.com> wrote:
>         >         >                                         Another
>         day, another
>         >         fq issue
>         >         >                         (or user error).
>         >         >
>         >         >
>         >         >                                         I try to
>         do the
>         >         seeminlig
>         >         >                         simple task of downloading
>         a
>         >         >                                         single
>         large file
>         >         over local
>         >         >                         gigabit  LAN from a
>         >         >                                         physical
>         server
>         >         running kernel
>         >         >                         4.8 and sch_fq on intel
>         >         >                                         server
>         NICs.
>         >         >
>         >         >
>         >         >                                         For some
>         reason it
>         >         wouldn't go
>         >         >                         past around 25 MB/s.
>         >         >                                         After
>         having
>         >         replaced SSL with
>         >         >                         no SSL, replaced apache
>         >         >                                         with nginx
>         and
>         >         verified that
>         >         >                         there is plenty of
>         >         >                                         bandwith
>         available
>         >         between my
>         >         >                         client and the server I
>         >         >                                         tried to
>         change
>         >         qdisc from fq
>         >         >                         to pfifo_fast. It
>         >         >                                         instantly
>         shot up to
>         >         around
>         >         >                         the expected 85-90 MB/s.
>         >         >                                         The same
>         happened
>         >         with
>         >         >                         fq_codel in place of fq.
>         >         >
>         >         >
>         >         >                                         I then
>         checked the
>         >         statistics
>         >         >                         for fq and the throttled
>         >         >                                         counter is
>         >         increasing
>         >         >                         massively every second
>         (eth0 and
>         >         >                                         eth1 is
>         LACPed using
>         >         Linux
>         >         >                         bonding so both is seen
>         >         >                                         here):
>         >         >
>         >         >
>         >         >                                         qdisc fq
>         8007: root
>         >         refcnt 2
>         >         >                         limit 10000p flow_limit
>         >         >                                         100p
>         buckets 1024
>         >         orphan_mask
>         >         >                         1023 quantum 3028
>         >         >
>          initial_quantum
>         >         15140
>         >         >                         refill_delay 40.0ms
>         >         >                                          Sent
>         787131797
>         >         bytes 520082
>         >         >                         pkt (dropped 15,
>         >         >                                         overlimits
>         0
>         >         requeues 0)
>         >         >                                          backlog
>         98410b 65p
>         >         requeues 0
>         >         >                                           15 flows
>         (14
>         >         inactive, 1
>         >         >                         throttled)
>         >         >                                           0 gc, 2
>         highprio,
>         >         259920
>         >         >                         throttled, 15 flows_plimit
>         >         >                                         qdisc fq
>         8008: root
>         >         refcnt 2
>         >         >                         limit 10000p flow_limit
>         >         >                                         100p
>         buckets 1024
>         >         orphan_mask
>         >         >                         1023 quantum 3028
>         >         >
>          initial_quantum
>         >         15140
>         >         >                         refill_delay 40.0ms
>         >         >                                          Sent
>         2533167 bytes
>         >         6731 pkt
>         >         >                         (dropped 0, overlimits 0
>         >         >                                         requeues
>         0)
>         >         >                                          backlog
>         0b 0p
>         >         requeues 0
>         >         >                                           24 flows
>         (24
>         >         inactive, 0
>         >         >                         throttled)
>         >         >                                           0 gc, 2
>         highprio,
>         >         397
>         >         >                         throttled
>         >         >
>         >         >
>         >         >                                         Do you
>         have any
>         >         suggestions?
>         >         >
>         >         >
>         >         >                                         Regards,
>         >         >
>          Hans-Kristian
>         >         >
>         >         >
>         >         >
>         >         >
>         >         >
>         >          _______________________________________________
>         >         >                         Bloat mailing list
>         >         >
>          Bloat@lists.bufferbloat.net
>         >         >
>         >          https://lists.bufferbloat.net/listinfo/bloat
>         >         >
>         >         >
>         >         >
>         >          _______________________________________________
>         >         >                 Bloat mailing list
>         >         >                 Bloat@lists.bufferbloat.net
>         >         >
>          https://lists.bufferbloat.net/listinfo/bloat
>         >         >
>         >         >
>         >
>         >
>         >
>         >
>         >
>         
>         
>         
> 
>