[Bloat] Excessive throttling with fq

Thu Jan 26 16:07:14 EST 2017

On Thu, 2017-01-26 at 22:02 +0100, Hans-Kristian Bakke wrote:
> It seems like it is not:
> 

It really should ;)

This is normally the default. Do you know why it is off ?

ethtool -K bond0 tso on

> 
> Features for bond0:
> rx-checksumming: off [fixed]
> tx-checksumming: on
> tx-checksum-ipv4: off [fixed]
> tx-checksum-ip-generic: on
> tx-checksum-ipv6: off [fixed]
> tx-checksum-fcoe-crc: off [fixed]
> tx-checksum-sctp: off [fixed]
> scatter-gather: on
> tx-scatter-gather: on
> tx-scatter-gather-fraglist: off [requested on]
> tcp-segmentation-offload: on
> tx-tcp-segmentation: off
> tx-tcp-ecn-segmentation: on
> tx-tcp-mangleid-segmentation: off [requested on]
> tx-tcp6-segmentation: on
> udp-fragmentation-offload: off [fixed]
> generic-segmentation-offload: on
> generic-receive-offload: on
> large-receive-offload: off
> rx-vlan-offload: on
> tx-vlan-offload: on
> ntuple-filters: off [fixed]
> receive-hashing: off [fixed]
> highdma: on
> rx-vlan-filter: on
> vlan-challenged: off [fixed]
> tx-lockless: on [fixed]
> netns-local: on [fixed]
> tx-gso-robust: off [fixed]
> tx-fcoe-segmentation: off [fixed]
> tx-gre-segmentation: on
> tx-gre-csum-segmentation: on
> tx-ipxip4-segmentation: on
> tx-ipxip6-segmentation: on
> tx-udp_tnl-segmentation: on
> tx-udp_tnl-csum-segmentation: on
> tx-gso-partial: off [fixed]
> tx-sctp-segmentation: off [fixed]
> fcoe-mtu: off [fixed]
> tx-nocache-copy: off
> loopback: off [fixed]
> rx-fcs: off [fixed]
> rx-all: off [fixed]
> tx-vlan-stag-hw-insert: off [fixed]
> rx-vlan-stag-hw-parse: off [fixed]
> rx-vlan-stag-filter: off [fixed]
> l2-fwd-offload: off [fixed]
> busy-poll: off [fixed]
> hw-tc-offload: off [fixed]
> 
> 
> 
> On 26 January 2017 at 22:00, Eric Dumazet <eric.dumazet at gmail.com>
> wrote:
>         For some reason, even though this NIC advertises TSO support,
>         tcpdump clearly shows TSO is not used at all.
>         
>         Oh wait, maybe TSO is not enabled on the bonding device ?
>         
>         On Thu, 2017-01-26 at 21:46 +0100, Hans-Kristian Bakke wrote:
>         > # ethtool -i eth0
>         > driver: e1000e
>         > version: 3.2.6-k
>         > firmware-version: 1.9-0
>         > expansion-rom-version:
>         > bus-info: 0000:04:00.0
>         > supports-statistics: yes
>         > supports-test: yes
>         > supports-eeprom-access: yes
>         > supports-register-dump: yes
>         > supports-priv-flags: no
>         >
>         >
>         > # ethtool -k eth0
>         > Features for eth0:
>         > rx-checksumming: on
>         > tx-checksumming: on
>         > tx-checksum-ipv4: off [fixed]
>         > tx-checksum-ip-generic: on
>         > tx-checksum-ipv6: off [fixed]
>         > tx-checksum-fcoe-crc: off [fixed]
>         > tx-checksum-sctp: off [fixed]
>         > scatter-gather: on
>         > tx-scatter-gather: on
>         > tx-scatter-gather-fraglist: off [fixed]
>         > tcp-segmentation-offload: on
>         > tx-tcp-segmentation: on
>         > tx-tcp-ecn-segmentation: off [fixed]
>         > tx-tcp-mangleid-segmentation: on
>         > tx-tcp6-segmentation: on
>         > udp-fragmentation-offload: off [fixed]
>         > generic-segmentation-offload: on
>         > generic-receive-offload: on
>         > large-receive-offload: off [fixed]
>         > rx-vlan-offload: on
>         > tx-vlan-offload: on
>         > ntuple-filters: off [fixed]
>         > receive-hashing: on
>         > highdma: on [fixed]
>         > rx-vlan-filter: on [fixed]
>         > vlan-challenged: off [fixed]
>         > tx-lockless: off [fixed]
>         > netns-local: off [fixed]
>         > tx-gso-robust: off [fixed]
>         > tx-fcoe-segmentation: off [fixed]
>         > tx-gre-segmentation: off [fixed]
>         > tx-gre-csum-segmentation: off [fixed]
>         > tx-ipxip4-segmentation: off [fixed]
>         > tx-ipxip6-segmentation: off [fixed]
>         > tx-udp_tnl-segmentation: off [fixed]
>         > tx-udp_tnl-csum-segmentation: off [fixed]
>         > tx-gso-partial: off [fixed]
>         > tx-sctp-segmentation: off [fixed]
>         > fcoe-mtu: off [fixed]
>         > tx-nocache-copy: off
>         > loopback: off [fixed]
>         > rx-fcs: off
>         > rx-all: off
>         > tx-vlan-stag-hw-insert: off [fixed]
>         > rx-vlan-stag-hw-parse: off [fixed]
>         > rx-vlan-stag-filter: off [fixed]
>         > l2-fwd-offload: off [fixed]
>         > busy-poll: off [fixed]
>         > hw-tc-offload: off [fixed]
>         >
>         >
>         > # grep HZ /boot/config-4.8.0-2-amd64
>         > CONFIG_NO_HZ_COMMON=y
>         > # CONFIG_HZ_PERIODIC is not set
>         > CONFIG_NO_HZ_IDLE=y
>         > # CONFIG_NO_HZ_FULL is not set
>         > # CONFIG_NO_HZ is not set
>         > # CONFIG_HZ_100 is not set
>         > CONFIG_HZ_250=y
>         > # CONFIG_HZ_300 is not set
>         > # CONFIG_HZ_1000 is not set
>         > CONFIG_HZ=250
>         > CONFIG_MACHZ_WDT=m
>         >
>         >
>         >
>         > On 26 January 2017 at 21:41, Eric Dumazet
>         <eric.dumazet at gmail.com>
>         > wrote:
>         >
>         >         Can you post :
>         >
>         >         ethtool -i eth0
>         >         ethtool -k eth0
>         >
>         >         grep HZ /boot/config.... (what is the HZ value of
>         your kernel)
>         >
>         >         I suspect a possible problem with TSO autodefer
>         when/if HZ <
>         >         1000
>         >
>         >         Thanks.
>         >
>         >         On Thu, 2017-01-26 at 21:19 +0100, Hans-Kristian
>         Bakke wrote:
>         >         > There are two packet captures from fq with and
>         without
>         >         pacing here:
>         >         >
>         >         >
>         >         >
>         https://owncloud.proikt.com/index.php/s/KuXIl8h8bSFH1fM
>         >         >
>         >         >
>         >         >
>         >         > The server (with fq pacing/nopacing) is 10.0.5.10
>         and is
>         >         running a
>         >         > Apache2 webserver at port tcp port 443. The tcp
>         client is
>         >         nginx
>         >         > reverse proxy at 10.0.5.13 on the same subnet
>         which again is
>         >         proxying
>         >         > the connection from the Windows 10 client.
>         >         > - I did try to connect directly to the server with
>         the
>         >         client (via a
>         >         > linux gateway router) avoiding the nginx proxy and
>         just
>         >         using plain
>         >         > no-ssl http. That did not change anything.
>         >         > - I also tried stopping the eth0 interface to
>         force the
>         >         traffic to the
>         >         > eth1 interface in the LACP which changed nothing.
>         >         > - I also pulled each of the cable on the switch to
>         force the
>         >         traffic
>         >         > to switch between interfaces in the LACP link
>         between the
>         >         client
>         >         > switch and the server switch.
>         >         >
>         >         >
>         >         > The CPU is a 5-6 year old Intel Xeon X3430 CPU @
>         4x2.40GHz
>         >         on a
>         >         > SuperMicro platform. It is not very loaded and the
>         results
>         >         are always
>         >         > in the same ballpark with fq pacing on.
>         >         >
>         >         >
>         >         >
>         >         > top - 21:12:38 up 12 days, 11:08,  4 users,  load
>         average:
>         >         0.56, 0.68,
>         >         > 0.77
>         >         > Tasks: 1344 total,   1 running, 1343 sleeping,   0
>         >         stopped,   0 zombie
>         >         > %Cpu0  :  0.0 us,  1.0 sy,  0.0 ni, 99.0 id,  0.0
>         wa,  0.0
>         >         hi,  0.0
>         >         > si,  0.0 st
>         >         > %Cpu1  :  0.0 us,  0.3 sy,  0.0 ni, 97.4 id,  2.0
>         wa,  0.0
>         >         hi,  0.3
>         >         > si,  0.0 st
>         >         > %Cpu2  :  0.0 us,  2.0 sy,  0.0 ni, 96.4 id,  1.3
>         wa,  0.0
>         >         hi,  0.3
>         >         > si,  0.0 st
>         >         > %Cpu3  :  0.7 us,  2.3 sy,  0.0 ni, 94.1 id,  3.0
>         wa,  0.0
>         >         hi,  0.0
>         >         > si,  0.0 st
>         >         > KiB Mem : 16427572 total,   173712 free,  9739976
>         used,
>         >         6513884
>         >         > buff/cache
>         >         > KiB Swap:  6369276 total,  6126736 free,   242540
>         used.
>         >         6224836 avail
>         >         > Mem
>         >         >
>         >         >
>         >         > This seems OK to me. It does have 24 drives in 3
>         ZFS pools
>         >         at 144TB
>         >         > raw storage in total with several SAS HBAs that is
>         pretty
>         >         much always
>         >         > poking the system in some way or the other.
>         >         >
>         >         >
>         >         > There are around 32K interrupts when running @23
>         MB/s (as
>         >         seen in
>         >         > chrome downloads) with pacing on and about 25K
>         interrupts
>         >         when running
>         >         > @105 MB/s with fq nopacing. Is that normal?
>         >         >
>         >         >
>         >         > Hans-Kristian
>         >         >
>         >         >
>         >         >
>         >         > On 26 January 2017 at 20:58, David Lang
>         <david at lang.hm>
>         >         wrote:
>         >         >         Is there any CPU bottleneck?
>         >         >
>         >         >         pacing causing this sort of problem makes
>         me thing
>         >         that the
>         >         >         CPU either can't keep up or that something
>         (Hz
>         >         setting type of
>         >         >         thing) is delaying when the CPU can get
>         used.
>         >         >
>         >         >         It's not clear from the posts if the
>         problem is with
>         >         sending
>         >         >         data or receiving data.
>         >         >
>         >         >         David Lang
>         >         >
>         >         >
>         >         >         On Thu, 26 Jan 2017, Eric Dumazet wrote:
>         >         >
>         >         >                 Nothing jumps on my head.
>         >         >
>         >         >                 We use FQ on links varying from
>         1Gbit to
>         >         100Gbit, and
>         >         >                 we have no such
>         >         >                 issues.
>         >         >
>         >         >                 You could probably check on the
>         server the
>         >         TCP various
>         >         >                 infos given by ss
>         >         >                 command
>         >         >
>         >         >
>         >         >                 ss -temoi dst <remoteip>
>         >         >
>         >         >
>         >         >                 pacing rate is shown. You might
>         have some
>         >         issues, but
>         >         >                 it is hard to say.
>         >         >
>         >         >
>         >         >                 On Thu, 2017-01-26 at 19:55 +0100,
>         >         Hans-Kristian Bakke
>         >         >                 wrote:
>         >         >                         After some more testing I
>         see that
>         >         if I
>         >         >                         disable fq pacing the
>         >         >                         performance is restored to
>         the
>         >         expected
>         >         >                         levels: # for i in eth0
>         eth1; do tc
>         >         qdisc
>         >         >                         replace dev $i root fq
>         nopacing;
>         >         >                         done
>         >         >
>         >         >
>         >         >                         Is this expected
>         behaviour? There is
>         >         some
>         >         >                         background traffic, but
>         only
>         >         >                         in the sub 100 mbit/s on
>         the
>         >         switches and
>         >         >                         gateway between the server
>         >         >                         and client.
>         >         >
>         >         >
>         >         >                         The chain:
>         >         >                         Windows 10 client -> 1000
>         mbit/s ->
>         >         switch ->
>         >         >                         2xgigabit LACP -> switch
>         >         >                         -> 4 x gigabit LACP -> gw
>         (fq_codel
>         >         on all
>         >         >                         nics) -> 4 x gigabit LACP
>         >         >                         (the same as in) -> switch
>         -> 2 x
>         >         lacp ->
>         >         >                         server (with misbehaving
>         fq
>         >         >                         pacing)
>         >         >
>         >         >
>         >         >
>         >         >                         On 26 January 2017 at
>         19:38,
>         >         Hans-Kristian
>         >         >                         Bakke <hkbakke at gmail.com>
>         >         >                         wrote:
>         >         >                                 I can add that
>         this is
>         >         without BBR,
>         >         >                         just plain old kernel 4.8
>         >         >                                 cubic.
>         >         >
>         >         >                                 On 26 January 2017
>         at 19:36,
>         >         >                         Hans-Kristian Bakke
>         >         >
>          <hkbakke at gmail.com> wrote:
>         >         >                                         Another
>         day, another
>         >         fq issue
>         >         >                         (or user error).
>         >         >
>         >         >
>         >         >                                         I try to
>         do the
>         >         seeminlig
>         >         >                         simple task of downloading
>         a
>         >         >                                         single
>         large file
>         >         over local
>         >         >                         gigabit  LAN from a
>         >         >                                         physical
>         server
>         >         running kernel
>         >         >                         4.8 and sch_fq on intel
>         >         >                                         server
>         NICs.
>         >         >
>         >         >
>         >         >                                         For some
>         reason it
>         >         wouldn't go
>         >         >                         past around 25 MB/s.
>         >         >                                         After
>         having
>         >         replaced SSL with
>         >         >                         no SSL, replaced apache
>         >         >                                         with nginx
>         and
>         >         verified that
>         >         >                         there is plenty of
>         >         >                                         bandwith
>         available
>         >         between my
>         >         >                         client and the server I
>         >         >                                         tried to
>         change
>         >         qdisc from fq
>         >         >                         to pfifo_fast. It
>         >         >                                         instantly
>         shot up to
>         >         around
>         >         >                         the expected 85-90 MB/s.
>         >         >                                         The same
>         happened
>         >         with
>         >         >                         fq_codel in place of fq.
>         >         >
>         >         >
>         >         >                                         I then
>         checked the
>         >         statistics
>         >         >                         for fq and the throttled
>         >         >                                         counter is
>         >         increasing
>         >         >                         massively every second
>         (eth0 and
>         >         >                                         eth1 is
>         LACPed using
>         >         Linux
>         >         >                         bonding so both is seen
>         >         >                                         here):
>         >         >
>         >         >
>         >         >                                         qdisc fq
>         8007: root
>         >         refcnt 2
>         >         >                         limit 10000p flow_limit
>         >         >                                         100p
>         buckets 1024
>         >         orphan_mask
>         >         >                         1023 quantum 3028
>         >         >
>          initial_quantum
>         >         15140
>         >         >                         refill_delay 40.0ms
>         >         >                                          Sent
>         787131797
>         >         bytes 520082
>         >         >                         pkt (dropped 15,
>         >         >                                         overlimits
>         0
>         >         requeues 0)
>         >         >                                          backlog
>         98410b 65p
>         >         requeues 0
>         >         >                                           15 flows
>         (14
>         >         inactive, 1
>         >         >                         throttled)
>         >         >                                           0 gc, 2
>         highprio,
>         >         259920
>         >         >                         throttled, 15 flows_plimit
>         >         >                                         qdisc fq
>         8008: root
>         >         refcnt 2
>         >         >                         limit 10000p flow_limit
>         >         >                                         100p
>         buckets 1024
>         >         orphan_mask
>         >         >                         1023 quantum 3028
>         >         >
>          initial_quantum
>         >         15140
>         >         >                         refill_delay 40.0ms
>         >         >                                          Sent
>         2533167 bytes
>         >         6731 pkt
>         >         >                         (dropped 0, overlimits 0
>         >         >                                         requeues
>         0)
>         >         >                                          backlog
>         0b 0p
>         >         requeues 0
>         >         >                                           24 flows
>         (24
>         >         inactive, 0
>         >         >                         throttled)
>         >         >                                           0 gc, 2
>         highprio,
>         >         397
>         >         >                         throttled
>         >         >
>         >         >
>         >         >                                         Do you
>         have any
>         >         suggestions?
>         >         >
>         >         >
>         >         >                                         Regards,
>         >         >
>          Hans-Kristian
>         >         >
>         >         >
>         >         >
>         >         >
>         >         >
>         >          _______________________________________________
>         >         >                         Bloat mailing list
>         >         >
>          Bloat at lists.bufferbloat.net
>         >         >
>         >          https://lists.bufferbloat.net/listinfo/bloat
>         >         >
>         >         >
>         >         >
>         >          _______________________________________________
>         >         >                 Bloat mailing list
>         >         >                 Bloat at lists.bufferbloat.net
>         >         >
>          https://lists.bufferbloat.net/listinfo/bloat
>         >         >
>         >         >
>         >
>         >
>         >
>         >
>         >
>         
>         
>         
> 
>