[Bloat] Excessive throttling with fq

Eric Dumazet eric.dumazet at gmail.com
Thu Jan 26 16:33:39 EST 2017


On Thu, 2017-01-26 at 22:20 +0100, Hans-Kristian Bakke wrote:
> Wow, that was it (after seeing your previous mail I disabled and
> reenabled tso and gso on all eth0, eth1 AND bond0 to reset all to the
> same state and it cleared up all the issues.
> 
> 
> In other words my issue was that my physical nics eth0 and eth1 had
> gso/tso enabled but my bond0 interface gso/tso disabled which
> everything else but fq with pacing did not seem to care about.
> 
> 
> The reason why is probably from my traffic shaper script in previous
> experiments the last couple of days.
> I actually think my gateway may have the same latent issue for fq with
> pacing as my HTB + fq_codel WAN traffic shaper script is automatically
> disabling tso and gso on the shaped interface, which in my case is
> bond0.12 (bonding AND VLANs) while the underlying physical interfaces
> still have tso and gso enabled as the script does not know that the
> interface happens to be bound to one or more layers of interfaces
> below that.
> 
> 
> 
> 
> This is the difference in the ethtool -k output between the
> non-working fq pacing settings and the working version.
> 
> 
> diff ethtool_k_bond0.txt ethtool_k_bond0-2.txt
> 13c13
> <       tx-tcp-segmentation: off
> ---
> >       tx-tcp-segmentation: on
> 15c15
> <       tx-tcp-mangleid-segmentation: off [requested on]
> ---
> >       tx-tcp-mangleid-segmentation: on
> 
> 
> Thank you for pointing me in the right direction! I don't know is this
> is a "wont-fix" issue because of non-logical user configuration or if
> it should be looked into to be handled better in the future.
> 
> 

non TSO devices are supported, but we would generally install FQ on the
bonding device, and let TSO enabled on the bonding.

This is because setting timers is expensive, and our design choices for
pacing tried hard to avoid setting timers for every 2 packets sent (as
in 1-MSS packets) ;)

( https://lwn.net/Articles/564978/ )

Of course this does not really matter for slow links (Like 10Mbit or
100Mbit NIC)
> 
> 
> On 26 January 2017 at 22:07, Eric Dumazet <eric.dumazet at gmail.com>
> wrote:
>         On Thu, 2017-01-26 at 22:02 +0100, Hans-Kristian Bakke wrote:
>         > It seems like it is not:
>         >
>         
>         It really should ;)
>         
>         This is normally the default. Do you know why it is off ?
>         
>         ethtool -K bond0 tso on
>         
>         
>         >
>         > Features for bond0:
>         > rx-checksumming: off [fixed]
>         > tx-checksumming: on
>         > tx-checksum-ipv4: off [fixed]
>         > tx-checksum-ip-generic: on
>         > tx-checksum-ipv6: off [fixed]
>         > tx-checksum-fcoe-crc: off [fixed]
>         > tx-checksum-sctp: off [fixed]
>         > scatter-gather: on
>         > tx-scatter-gather: on
>         > tx-scatter-gather-fraglist: off [requested on]
>         > tcp-segmentation-offload: on
>         > tx-tcp-segmentation: off
>         > tx-tcp-ecn-segmentation: on
>         > tx-tcp-mangleid-segmentation: off [requested on]
>         > tx-tcp6-segmentation: on
>         > udp-fragmentation-offload: off [fixed]
>         > generic-segmentation-offload: on
>         > generic-receive-offload: on
>         > large-receive-offload: off
>         > rx-vlan-offload: on
>         > tx-vlan-offload: on
>         > ntuple-filters: off [fixed]
>         > receive-hashing: off [fixed]
>         > highdma: on
>         > rx-vlan-filter: on
>         > vlan-challenged: off [fixed]
>         > tx-lockless: on [fixed]
>         > netns-local: on [fixed]
>         > tx-gso-robust: off [fixed]
>         > tx-fcoe-segmentation: off [fixed]
>         > tx-gre-segmentation: on
>         > tx-gre-csum-segmentation: on
>         > tx-ipxip4-segmentation: on
>         > tx-ipxip6-segmentation: on
>         > tx-udp_tnl-segmentation: on
>         > tx-udp_tnl-csum-segmentation: on
>         > tx-gso-partial: off [fixed]
>         > tx-sctp-segmentation: off [fixed]
>         > fcoe-mtu: off [fixed]
>         > tx-nocache-copy: off
>         > loopback: off [fixed]
>         > rx-fcs: off [fixed]
>         > rx-all: off [fixed]
>         > tx-vlan-stag-hw-insert: off [fixed]
>         > rx-vlan-stag-hw-parse: off [fixed]
>         > rx-vlan-stag-filter: off [fixed]
>         > l2-fwd-offload: off [fixed]
>         > busy-poll: off [fixed]
>         > hw-tc-offload: off [fixed]
>         >
>         >
>         >
>         > On 26 January 2017 at 22:00, Eric Dumazet
>         <eric.dumazet at gmail.com>
>         > wrote:
>         >         For some reason, even though this NIC advertises TSO
>         support,
>         >         tcpdump clearly shows TSO is not used at all.
>         >
>         >         Oh wait, maybe TSO is not enabled on the bonding
>         device ?
>         >
>         >         On Thu, 2017-01-26 at 21:46 +0100, Hans-Kristian
>         Bakke wrote:
>         >         > # ethtool -i eth0
>         >         > driver: e1000e
>         >         > version: 3.2.6-k
>         >         > firmware-version: 1.9-0
>         >         > expansion-rom-version:
>         >         > bus-info: 0000:04:00.0
>         >         > supports-statistics: yes
>         >         > supports-test: yes
>         >         > supports-eeprom-access: yes
>         >         > supports-register-dump: yes
>         >         > supports-priv-flags: no
>         >         >
>         >         >
>         >         > # ethtool -k eth0
>         >         > Features for eth0:
>         >         > rx-checksumming: on
>         >         > tx-checksumming: on
>         >         > tx-checksum-ipv4: off [fixed]
>         >         > tx-checksum-ip-generic: on
>         >         > tx-checksum-ipv6: off [fixed]
>         >         > tx-checksum-fcoe-crc: off [fixed]
>         >         > tx-checksum-sctp: off [fixed]
>         >         > scatter-gather: on
>         >         > tx-scatter-gather: on
>         >         > tx-scatter-gather-fraglist: off [fixed]
>         >         > tcp-segmentation-offload: on
>         >         > tx-tcp-segmentation: on
>         >         > tx-tcp-ecn-segmentation: off [fixed]
>         >         > tx-tcp-mangleid-segmentation: on
>         >         > tx-tcp6-segmentation: on
>         >         > udp-fragmentation-offload: off [fixed]
>         >         > generic-segmentation-offload: on
>         >         > generic-receive-offload: on
>         >         > large-receive-offload: off [fixed]
>         >         > rx-vlan-offload: on
>         >         > tx-vlan-offload: on
>         >         > ntuple-filters: off [fixed]
>         >         > receive-hashing: on
>         >         > highdma: on [fixed]
>         >         > rx-vlan-filter: on [fixed]
>         >         > vlan-challenged: off [fixed]
>         >         > tx-lockless: off [fixed]
>         >         > netns-local: off [fixed]
>         >         > tx-gso-robust: off [fixed]
>         >         > tx-fcoe-segmentation: off [fixed]
>         >         > tx-gre-segmentation: off [fixed]
>         >         > tx-gre-csum-segmentation: off [fixed]
>         >         > tx-ipxip4-segmentation: off [fixed]
>         >         > tx-ipxip6-segmentation: off [fixed]
>         >         > tx-udp_tnl-segmentation: off [fixed]
>         >         > tx-udp_tnl-csum-segmentation: off [fixed]
>         >         > tx-gso-partial: off [fixed]
>         >         > tx-sctp-segmentation: off [fixed]
>         >         > fcoe-mtu: off [fixed]
>         >         > tx-nocache-copy: off
>         >         > loopback: off [fixed]
>         >         > rx-fcs: off
>         >         > rx-all: off
>         >         > tx-vlan-stag-hw-insert: off [fixed]
>         >         > rx-vlan-stag-hw-parse: off [fixed]
>         >         > rx-vlan-stag-filter: off [fixed]
>         >         > l2-fwd-offload: off [fixed]
>         >         > busy-poll: off [fixed]
>         >         > hw-tc-offload: off [fixed]
>         >         >
>         >         >
>         >         > # grep HZ /boot/config-4.8.0-2-amd64
>         >         > CONFIG_NO_HZ_COMMON=y
>         >         > # CONFIG_HZ_PERIODIC is not set
>         >         > CONFIG_NO_HZ_IDLE=y
>         >         > # CONFIG_NO_HZ_FULL is not set
>         >         > # CONFIG_NO_HZ is not set
>         >         > # CONFIG_HZ_100 is not set
>         >         > CONFIG_HZ_250=y
>         >         > # CONFIG_HZ_300 is not set
>         >         > # CONFIG_HZ_1000 is not set
>         >         > CONFIG_HZ=250
>         >         > CONFIG_MACHZ_WDT=m
>         >         >
>         >         >
>         >         >
>         >         > On 26 January 2017 at 21:41, Eric Dumazet
>         >         <eric.dumazet at gmail.com>
>         >         > wrote:
>         >         >
>         >         >         Can you post :
>         >         >
>         >         >         ethtool -i eth0
>         >         >         ethtool -k eth0
>         >         >
>         >         >         grep HZ /boot/config.... (what is the HZ
>         value of
>         >         your kernel)
>         >         >
>         >         >         I suspect a possible problem with TSO
>         autodefer
>         >         when/if HZ <
>         >         >         1000
>         >         >
>         >         >         Thanks.
>         >         >
>         >         >         On Thu, 2017-01-26 at 21:19 +0100,
>         Hans-Kristian
>         >         Bakke wrote:
>         >         >         > There are two packet captures from fq
>         with and
>         >         without
>         >         >         pacing here:
>         >         >         >
>         >         >         >
>         >         >         >
>         >
>          https://owncloud.proikt.com/index.php/s/KuXIl8h8bSFH1fM
>         >         >         >
>         >         >         >
>         >         >         >
>         >         >         > The server (with fq pacing/nopacing) is
>         10.0.5.10
>         >         and is
>         >         >         running a
>         >         >         > Apache2 webserver at port tcp port 443.
>         The tcp
>         >         client is
>         >         >         nginx
>         >         >         > reverse proxy at 10.0.5.13 on the same
>         subnet
>         >         which again is
>         >         >         proxying
>         >         >         > the connection from the Windows 10
>         client.
>         >         >         > - I did try to connect directly to the
>         server with
>         >         the
>         >         >         client (via a
>         >         >         > linux gateway router) avoiding the nginx
>         proxy and
>         >         just
>         >         >         using plain
>         >         >         > no-ssl http. That did not change
>         anything.
>         >         >         > - I also tried stopping the eth0
>         interface to
>         >         force the
>         >         >         traffic to the
>         >         >         > eth1 interface in the LACP which changed
>         nothing.
>         >         >         > - I also pulled each of the cable on the
>         switch to
>         >         force the
>         >         >         traffic
>         >         >         > to switch between interfaces in the LACP
>         link
>         >         between the
>         >         >         client
>         >         >         > switch and the server switch.
>         >         >         >
>         >         >         >
>         >         >         > The CPU is a 5-6 year old Intel Xeon
>         X3430 CPU @
>         >         4x2.40GHz
>         >         >         on a
>         >         >         > SuperMicro platform. It is not very
>         loaded and the
>         >         results
>         >         >         are always
>         >         >         > in the same ballpark with fq pacing on.
>         >         >         >
>         >         >         >
>         >         >         >
>         >         >         > top - 21:12:38 up 12 days, 11:08,  4
>         users,  load
>         >         average:
>         >         >         0.56, 0.68,
>         >         >         > 0.77
>         >         >         > Tasks: 1344 total,   1 running, 1343
>         sleeping,   0
>         >         >         stopped,   0 zombie
>         >         >         > %Cpu0  :  0.0 us,  1.0 sy,  0.0 ni, 99.0
>         id,  0.0
>         >         wa,  0.0
>         >         >         hi,  0.0
>         >         >         > si,  0.0 st
>         >         >         > %Cpu1  :  0.0 us,  0.3 sy,  0.0 ni, 97.4
>         id,  2.0
>         >         wa,  0.0
>         >         >         hi,  0.3
>         >         >         > si,  0.0 st
>         >         >         > %Cpu2  :  0.0 us,  2.0 sy,  0.0 ni, 96.4
>         id,  1.3
>         >         wa,  0.0
>         >         >         hi,  0.3
>         >         >         > si,  0.0 st
>         >         >         > %Cpu3  :  0.7 us,  2.3 sy,  0.0 ni, 94.1
>         id,  3.0
>         >         wa,  0.0
>         >         >         hi,  0.0
>         >         >         > si,  0.0 st
>         >         >         > KiB Mem : 16427572 total,   173712 free,
>         9739976
>         >         used,
>         >         >         6513884
>         >         >         > buff/cache
>         >         >         > KiB Swap:  6369276 total,  6126736
>         free,   242540
>         >         used.
>         >         >         6224836 avail
>         >         >         > Mem
>         >         >         >
>         >         >         >
>         >         >         > This seems OK to me. It does have 24
>         drives in 3
>         >         ZFS pools
>         >         >         at 144TB
>         >         >         > raw storage in total with several SAS
>         HBAs that is
>         >         pretty
>         >         >         much always
>         >         >         > poking the system in some way or the
>         other.
>         >         >         >
>         >         >         >
>         >         >         > There are around 32K interrupts when
>         running @23
>         >         MB/s (as
>         >         >         seen in
>         >         >         > chrome downloads) with pacing on and
>         about 25K
>         >         interrupts
>         >         >         when running
>         >         >         > @105 MB/s with fq nopacing. Is that
>         normal?
>         >         >         >
>         >         >         >
>         >         >         > Hans-Kristian
>         >         >         >
>         >         >         >
>         >         >         >
>         >         >         > On 26 January 2017 at 20:58, David Lang
>         >         <david at lang.hm>
>         >         >         wrote:
>         >         >         >         Is there any CPU bottleneck?
>         >         >         >
>         >         >         >         pacing causing this sort of
>         problem makes
>         >         me thing
>         >         >         that the
>         >         >         >         CPU either can't keep up or that
>         something
>         >         (Hz
>         >         >         setting type of
>         >         >         >         thing) is delaying when the CPU
>         can get
>         >         used.
>         >         >         >
>         >         >         >         It's not clear from the posts if
>         the
>         >         problem is with
>         >         >         sending
>         >         >         >         data or receiving data.
>         >         >         >
>         >         >         >         David Lang
>         >         >         >
>         >         >         >
>         >         >         >         On Thu, 26 Jan 2017, Eric
>         Dumazet wrote:
>         >         >         >
>         >         >         >                 Nothing jumps on my
>         head.
>         >         >         >
>         >         >         >                 We use FQ on links
>         varying from
>         >         1Gbit to
>         >         >         100Gbit, and
>         >         >         >                 we have no such
>         >         >         >                 issues.
>         >         >         >
>         >         >         >                 You could probably check
>         on the
>         >         server the
>         >         >         TCP various
>         >         >         >                 infos given by ss
>         >         >         >                 command
>         >         >         >
>         >         >         >
>         >         >         >                 ss -temoi dst <remoteip>
>         >         >         >
>         >         >         >
>         >         >         >                 pacing rate is shown.
>         You might
>         >         have some
>         >         >         issues, but
>         >         >         >                 it is hard to say.
>         >         >         >
>         >         >         >
>         >         >         >                 On Thu, 2017-01-26 at
>         19:55 +0100,
>         >         >         Hans-Kristian Bakke
>         >         >         >                 wrote:
>         >         >         >                         After some more
>         testing I
>         >         see that
>         >         >         if I
>         >         >         >                         disable fq
>         pacing the
>         >         >         >                         performance is
>         restored to
>         >         the
>         >         >         expected
>         >         >         >                         levels: # for i
>         in eth0
>         >         eth1; do tc
>         >         >         qdisc
>         >         >         >                         replace dev $i
>         root fq
>         >         nopacing;
>         >         >         >                         done
>         >         >         >
>         >         >         >
>         >         >         >                         Is this expected
>         >         behaviour? There is
>         >         >         some
>         >         >         >                         background
>         traffic, but
>         >         only
>         >         >         >                         in the sub 100
>         mbit/s on
>         >         the
>         >         >         switches and
>         >         >         >                         gateway between
>         the server
>         >         >         >                         and client.
>         >         >         >
>         >         >         >
>         >         >         >                         The chain:
>         >         >         >                         Windows 10
>         client -> 1000
>         >         mbit/s ->
>         >         >         switch ->
>         >         >         >                         2xgigabit LACP
>         -> switch
>         >         >         >                         -> 4 x gigabit
>         LACP -> gw
>         >         (fq_codel
>         >         >         on all
>         >         >         >                         nics) -> 4 x
>         gigabit LACP
>         >         >         >                         (the same as in)
>         -> switch
>         >         -> 2 x
>         >         >         lacp ->
>         >         >         >                         server (with
>         misbehaving
>         >         fq
>         >         >         >                         pacing)
>         >         >         >
>         >         >         >
>         >         >         >
>         >         >         >                         On 26 January
>         2017 at
>         >         19:38,
>         >         >         Hans-Kristian
>         >         >         >                         Bakke
>         <hkbakke at gmail.com>
>         >         >         >                         wrote:
>         >         >         >                                 I can
>         add that
>         >         this is
>         >         >         without BBR,
>         >         >         >                         just plain old
>         kernel 4.8
>         >         >         >                                 cubic.
>         >         >         >
>         >         >         >                                 On 26
>         January 2017
>         >         at 19:36,
>         >         >         >                         Hans-Kristian
>         Bakke
>         >         >         >
>         >          <hkbakke at gmail.com> wrote:
>         >         >         >
>          Another
>         >         day, another
>         >         >         fq issue
>         >         >         >                         (or user error).
>         >         >         >
>         >         >         >
>         >         >         >
>          I try to
>         >         do the
>         >         >         seeminlig
>         >         >         >                         simple task of
>         downloading
>         >         a
>         >         >         >
>          single
>         >         large file
>         >         >         over local
>         >         >         >                         gigabit  LAN
>         from a
>         >         >         >
>          physical
>         >         server
>         >         >         running kernel
>         >         >         >                         4.8 and sch_fq
>         on intel
>         >         >         >
>          server
>         >         NICs.
>         >         >         >
>         >         >         >
>         >         >         >
>          For some
>         >         reason it
>         >         >         wouldn't go
>         >         >         >                         past around 25
>         MB/s.
>         >         >         >
>          After
>         >         having
>         >         >         replaced SSL with
>         >         >         >                         no SSL, replaced
>         apache
>         >         >         >
>          with nginx
>         >         and
>         >         >         verified that
>         >         >         >                         there is plenty
>         of
>         >         >         >
>          bandwith
>         >         available
>         >         >         between my
>         >         >         >                         client and the
>         server I
>         >         >         >
>          tried to
>         >         change
>         >         >         qdisc from fq
>         >         >         >                         to pfifo_fast.
>         It
>         >         >         >
>          instantly
>         >         shot up to
>         >         >         around
>         >         >         >                         the expected
>         85-90 MB/s.
>         >         >         >
>          The same
>         >         happened
>         >         >         with
>         >         >         >                         fq_codel in
>         place of fq.
>         >         >         >
>         >         >         >
>         >         >         >
>          I then
>         >         checked the
>         >         >         statistics
>         >         >         >                         for fq and the
>         throttled
>         >         >         >
>          counter is
>         >         >         increasing
>         >         >         >                         massively every
>         second
>         >         (eth0 and
>         >         >         >
>          eth1 is
>         >         LACPed using
>         >         >         Linux
>         >         >         >                         bonding so both
>         is seen
>         >         >         >
>          here):
>         >         >         >
>         >         >         >
>         >         >         >
>          qdisc fq
>         >         8007: root
>         >         >         refcnt 2
>         >         >         >                         limit 10000p
>         flow_limit
>         >         >         >
>          100p
>         >         buckets 1024
>         >         >         orphan_mask
>         >         >         >                         1023 quantum
>         3028
>         >         >         >
>         >          initial_quantum
>         >         >         15140
>         >         >         >                         refill_delay
>         40.0ms
>         >         >         >
>         Sent
>         >         787131797
>         >         >         bytes 520082
>         >         >         >                         pkt (dropped 15,
>         >         >         >
>          overlimits
>         >         0
>         >         >         requeues 0)
>         >         >         >
>         backlog
>         >         98410b 65p
>         >         >         requeues 0
>         >         >         >
>            15 flows
>         >         (14
>         >         >         inactive, 1
>         >         >         >                         throttled)
>         >         >         >
>            0 gc, 2
>         >         highprio,
>         >         >         259920
>         >         >         >                         throttled, 15
>         flows_plimit
>         >         >         >
>          qdisc fq
>         >         8008: root
>         >         >         refcnt 2
>         >         >         >                         limit 10000p
>         flow_limit
>         >         >         >
>          100p
>         >         buckets 1024
>         >         >         orphan_mask
>         >         >         >                         1023 quantum
>         3028
>         >         >         >
>         >          initial_quantum
>         >         >         15140
>         >         >         >                         refill_delay
>         40.0ms
>         >         >         >
>         Sent
>         >         2533167 bytes
>         >         >         6731 pkt
>         >         >         >                         (dropped 0,
>         overlimits 0
>         >         >         >
>          requeues
>         >         0)
>         >         >         >
>         backlog
>         >         0b 0p
>         >         >         requeues 0
>         >         >         >
>            24 flows
>         >         (24
>         >         >         inactive, 0
>         >         >         >                         throttled)
>         >         >         >
>            0 gc, 2
>         >         highprio,
>         >         >         397
>         >         >         >                         throttled
>         >         >         >
>         >         >         >
>         >         >         >
>          Do you
>         >         have any
>         >         >         suggestions?
>         >         >         >
>         >         >         >
>         >         >         >
>          Regards,
>         >         >         >
>         >          Hans-Kristian
>         >         >         >
>         >         >         >
>         >         >         >
>         >         >         >
>         >         >         >
>         >         >
>         _______________________________________________
>         >         >         >                         Bloat mailing
>         list
>         >         >         >
>         >          Bloat at lists.bufferbloat.net
>         >         >         >
>         >         >
>         https://lists.bufferbloat.net/listinfo/bloat
>         >         >         >
>         >         >         >
>         >         >         >
>         >         >
>         _______________________________________________
>         >         >         >                 Bloat mailing list
>         >         >         >
>          Bloat at lists.bufferbloat.net
>         >         >         >
>         >          https://lists.bufferbloat.net/listinfo/bloat
>         >         >         >
>         >         >         >
>         >         >
>         >         >
>         >         >
>         >         >
>         >         >
>         >
>         >
>         >
>         >
>         >
>         
>         
>         
> 
> 





More information about the Bloat mailing list