[Bloat] Excessive throttling with fq
Eric Dumazet
eric.dumazet at gmail.com
Thu Jan 26 16:07:14 EST 2017
On Thu, 2017-01-26 at 22:02 +0100, Hans-Kristian Bakke wrote:
> It seems like it is not:
>
It really should ;)
This is normally the default. Do you know why it is off ?
ethtool -K bond0 tso on
>
> Features for bond0:
> rx-checksumming: off [fixed]
> tx-checksumming: on
> tx-checksum-ipv4: off [fixed]
> tx-checksum-ip-generic: on
> tx-checksum-ipv6: off [fixed]
> tx-checksum-fcoe-crc: off [fixed]
> tx-checksum-sctp: off [fixed]
> scatter-gather: on
> tx-scatter-gather: on
> tx-scatter-gather-fraglist: off [requested on]
> tcp-segmentation-offload: on
> tx-tcp-segmentation: off
> tx-tcp-ecn-segmentation: on
> tx-tcp-mangleid-segmentation: off [requested on]
> tx-tcp6-segmentation: on
> udp-fragmentation-offload: off [fixed]
> generic-segmentation-offload: on
> generic-receive-offload: on
> large-receive-offload: off
> rx-vlan-offload: on
> tx-vlan-offload: on
> ntuple-filters: off [fixed]
> receive-hashing: off [fixed]
> highdma: on
> rx-vlan-filter: on
> vlan-challenged: off [fixed]
> tx-lockless: on [fixed]
> netns-local: on [fixed]
> tx-gso-robust: off [fixed]
> tx-fcoe-segmentation: off [fixed]
> tx-gre-segmentation: on
> tx-gre-csum-segmentation: on
> tx-ipxip4-segmentation: on
> tx-ipxip6-segmentation: on
> tx-udp_tnl-segmentation: on
> tx-udp_tnl-csum-segmentation: on
> tx-gso-partial: off [fixed]
> tx-sctp-segmentation: off [fixed]
> fcoe-mtu: off [fixed]
> tx-nocache-copy: off
> loopback: off [fixed]
> rx-fcs: off [fixed]
> rx-all: off [fixed]
> tx-vlan-stag-hw-insert: off [fixed]
> rx-vlan-stag-hw-parse: off [fixed]
> rx-vlan-stag-filter: off [fixed]
> l2-fwd-offload: off [fixed]
> busy-poll: off [fixed]
> hw-tc-offload: off [fixed]
>
>
>
> On 26 January 2017 at 22:00, Eric Dumazet <eric.dumazet at gmail.com>
> wrote:
> For some reason, even though this NIC advertises TSO support,
> tcpdump clearly shows TSO is not used at all.
>
> Oh wait, maybe TSO is not enabled on the bonding device ?
>
> On Thu, 2017-01-26 at 21:46 +0100, Hans-Kristian Bakke wrote:
> > # ethtool -i eth0
> > driver: e1000e
> > version: 3.2.6-k
> > firmware-version: 1.9-0
> > expansion-rom-version:
> > bus-info: 0000:04:00.0
> > supports-statistics: yes
> > supports-test: yes
> > supports-eeprom-access: yes
> > supports-register-dump: yes
> > supports-priv-flags: no
> >
> >
> > # ethtool -k eth0
> > Features for eth0:
> > rx-checksumming: on
> > tx-checksumming: on
> > tx-checksum-ipv4: off [fixed]
> > tx-checksum-ip-generic: on
> > tx-checksum-ipv6: off [fixed]
> > tx-checksum-fcoe-crc: off [fixed]
> > tx-checksum-sctp: off [fixed]
> > scatter-gather: on
> > tx-scatter-gather: on
> > tx-scatter-gather-fraglist: off [fixed]
> > tcp-segmentation-offload: on
> > tx-tcp-segmentation: on
> > tx-tcp-ecn-segmentation: off [fixed]
> > tx-tcp-mangleid-segmentation: on
> > tx-tcp6-segmentation: on
> > udp-fragmentation-offload: off [fixed]
> > generic-segmentation-offload: on
> > generic-receive-offload: on
> > large-receive-offload: off [fixed]
> > rx-vlan-offload: on
> > tx-vlan-offload: on
> > ntuple-filters: off [fixed]
> > receive-hashing: on
> > highdma: on [fixed]
> > rx-vlan-filter: on [fixed]
> > vlan-challenged: off [fixed]
> > tx-lockless: off [fixed]
> > netns-local: off [fixed]
> > tx-gso-robust: off [fixed]
> > tx-fcoe-segmentation: off [fixed]
> > tx-gre-segmentation: off [fixed]
> > tx-gre-csum-segmentation: off [fixed]
> > tx-ipxip4-segmentation: off [fixed]
> > tx-ipxip6-segmentation: off [fixed]
> > tx-udp_tnl-segmentation: off [fixed]
> > tx-udp_tnl-csum-segmentation: off [fixed]
> > tx-gso-partial: off [fixed]
> > tx-sctp-segmentation: off [fixed]
> > fcoe-mtu: off [fixed]
> > tx-nocache-copy: off
> > loopback: off [fixed]
> > rx-fcs: off
> > rx-all: off
> > tx-vlan-stag-hw-insert: off [fixed]
> > rx-vlan-stag-hw-parse: off [fixed]
> > rx-vlan-stag-filter: off [fixed]
> > l2-fwd-offload: off [fixed]
> > busy-poll: off [fixed]
> > hw-tc-offload: off [fixed]
> >
> >
> > # grep HZ /boot/config-4.8.0-2-amd64
> > CONFIG_NO_HZ_COMMON=y
> > # CONFIG_HZ_PERIODIC is not set
> > CONFIG_NO_HZ_IDLE=y
> > # CONFIG_NO_HZ_FULL is not set
> > # CONFIG_NO_HZ is not set
> > # CONFIG_HZ_100 is not set
> > CONFIG_HZ_250=y
> > # CONFIG_HZ_300 is not set
> > # CONFIG_HZ_1000 is not set
> > CONFIG_HZ=250
> > CONFIG_MACHZ_WDT=m
> >
> >
> >
> > On 26 January 2017 at 21:41, Eric Dumazet
> <eric.dumazet at gmail.com>
> > wrote:
> >
> > Can you post :
> >
> > ethtool -i eth0
> > ethtool -k eth0
> >
> > grep HZ /boot/config.... (what is the HZ value of
> your kernel)
> >
> > I suspect a possible problem with TSO autodefer
> when/if HZ <
> > 1000
> >
> > Thanks.
> >
> > On Thu, 2017-01-26 at 21:19 +0100, Hans-Kristian
> Bakke wrote:
> > > There are two packet captures from fq with and
> without
> > pacing here:
> > >
> > >
> > >
> https://owncloud.proikt.com/index.php/s/KuXIl8h8bSFH1fM
> > >
> > >
> > >
> > > The server (with fq pacing/nopacing) is 10.0.5.10
> and is
> > running a
> > > Apache2 webserver at port tcp port 443. The tcp
> client is
> > nginx
> > > reverse proxy at 10.0.5.13 on the same subnet
> which again is
> > proxying
> > > the connection from the Windows 10 client.
> > > - I did try to connect directly to the server with
> the
> > client (via a
> > > linux gateway router) avoiding the nginx proxy and
> just
> > using plain
> > > no-ssl http. That did not change anything.
> > > - I also tried stopping the eth0 interface to
> force the
> > traffic to the
> > > eth1 interface in the LACP which changed nothing.
> > > - I also pulled each of the cable on the switch to
> force the
> > traffic
> > > to switch between interfaces in the LACP link
> between the
> > client
> > > switch and the server switch.
> > >
> > >
> > > The CPU is a 5-6 year old Intel Xeon X3430 CPU @
> 4x2.40GHz
> > on a
> > > SuperMicro platform. It is not very loaded and the
> results
> > are always
> > > in the same ballpark with fq pacing on.
> > >
> > >
> > >
> > > top - 21:12:38 up 12 days, 11:08, 4 users, load
> average:
> > 0.56, 0.68,
> > > 0.77
> > > Tasks: 1344 total, 1 running, 1343 sleeping, 0
> > stopped, 0 zombie
> > > %Cpu0 : 0.0 us, 1.0 sy, 0.0 ni, 99.0 id, 0.0
> wa, 0.0
> > hi, 0.0
> > > si, 0.0 st
> > > %Cpu1 : 0.0 us, 0.3 sy, 0.0 ni, 97.4 id, 2.0
> wa, 0.0
> > hi, 0.3
> > > si, 0.0 st
> > > %Cpu2 : 0.0 us, 2.0 sy, 0.0 ni, 96.4 id, 1.3
> wa, 0.0
> > hi, 0.3
> > > si, 0.0 st
> > > %Cpu3 : 0.7 us, 2.3 sy, 0.0 ni, 94.1 id, 3.0
> wa, 0.0
> > hi, 0.0
> > > si, 0.0 st
> > > KiB Mem : 16427572 total, 173712 free, 9739976
> used,
> > 6513884
> > > buff/cache
> > > KiB Swap: 6369276 total, 6126736 free, 242540
> used.
> > 6224836 avail
> > > Mem
> > >
> > >
> > > This seems OK to me. It does have 24 drives in 3
> ZFS pools
> > at 144TB
> > > raw storage in total with several SAS HBAs that is
> pretty
> > much always
> > > poking the system in some way or the other.
> > >
> > >
> > > There are around 32K interrupts when running @23
> MB/s (as
> > seen in
> > > chrome downloads) with pacing on and about 25K
> interrupts
> > when running
> > > @105 MB/s with fq nopacing. Is that normal?
> > >
> > >
> > > Hans-Kristian
> > >
> > >
> > >
> > > On 26 January 2017 at 20:58, David Lang
> <david at lang.hm>
> > wrote:
> > > Is there any CPU bottleneck?
> > >
> > > pacing causing this sort of problem makes
> me thing
> > that the
> > > CPU either can't keep up or that something
> (Hz
> > setting type of
> > > thing) is delaying when the CPU can get
> used.
> > >
> > > It's not clear from the posts if the
> problem is with
> > sending
> > > data or receiving data.
> > >
> > > David Lang
> > >
> > >
> > > On Thu, 26 Jan 2017, Eric Dumazet wrote:
> > >
> > > Nothing jumps on my head.
> > >
> > > We use FQ on links varying from
> 1Gbit to
> > 100Gbit, and
> > > we have no such
> > > issues.
> > >
> > > You could probably check on the
> server the
> > TCP various
> > > infos given by ss
> > > command
> > >
> > >
> > > ss -temoi dst <remoteip>
> > >
> > >
> > > pacing rate is shown. You might
> have some
> > issues, but
> > > it is hard to say.
> > >
> > >
> > > On Thu, 2017-01-26 at 19:55 +0100,
> > Hans-Kristian Bakke
> > > wrote:
> > > After some more testing I
> see that
> > if I
> > > disable fq pacing the
> > > performance is restored to
> the
> > expected
> > > levels: # for i in eth0
> eth1; do tc
> > qdisc
> > > replace dev $i root fq
> nopacing;
> > > done
> > >
> > >
> > > Is this expected
> behaviour? There is
> > some
> > > background traffic, but
> only
> > > in the sub 100 mbit/s on
> the
> > switches and
> > > gateway between the server
> > > and client.
> > >
> > >
> > > The chain:
> > > Windows 10 client -> 1000
> mbit/s ->
> > switch ->
> > > 2xgigabit LACP -> switch
> > > -> 4 x gigabit LACP -> gw
> (fq_codel
> > on all
> > > nics) -> 4 x gigabit LACP
> > > (the same as in) -> switch
> -> 2 x
> > lacp ->
> > > server (with misbehaving
> fq
> > > pacing)
> > >
> > >
> > >
> > > On 26 January 2017 at
> 19:38,
> > Hans-Kristian
> > > Bakke <hkbakke at gmail.com>
> > > wrote:
> > > I can add that
> this is
> > without BBR,
> > > just plain old kernel 4.8
> > > cubic.
> > >
> > > On 26 January 2017
> at 19:36,
> > > Hans-Kristian Bakke
> > >
> <hkbakke at gmail.com> wrote:
> > > Another
> day, another
> > fq issue
> > > (or user error).
> > >
> > >
> > > I try to
> do the
> > seeminlig
> > > simple task of downloading
> a
> > > single
> large file
> > over local
> > > gigabit LAN from a
> > > physical
> server
> > running kernel
> > > 4.8 and sch_fq on intel
> > > server
> NICs.
> > >
> > >
> > > For some
> reason it
> > wouldn't go
> > > past around 25 MB/s.
> > > After
> having
> > replaced SSL with
> > > no SSL, replaced apache
> > > with nginx
> and
> > verified that
> > > there is plenty of
> > > bandwith
> available
> > between my
> > > client and the server I
> > > tried to
> change
> > qdisc from fq
> > > to pfifo_fast. It
> > > instantly
> shot up to
> > around
> > > the expected 85-90 MB/s.
> > > The same
> happened
> > with
> > > fq_codel in place of fq.
> > >
> > >
> > > I then
> checked the
> > statistics
> > > for fq and the throttled
> > > counter is
> > increasing
> > > massively every second
> (eth0 and
> > > eth1 is
> LACPed using
> > Linux
> > > bonding so both is seen
> > > here):
> > >
> > >
> > > qdisc fq
> 8007: root
> > refcnt 2
> > > limit 10000p flow_limit
> > > 100p
> buckets 1024
> > orphan_mask
> > > 1023 quantum 3028
> > >
> initial_quantum
> > 15140
> > > refill_delay 40.0ms
> > > Sent
> 787131797
> > bytes 520082
> > > pkt (dropped 15,
> > > overlimits
> 0
> > requeues 0)
> > > backlog
> 98410b 65p
> > requeues 0
> > > 15 flows
> (14
> > inactive, 1
> > > throttled)
> > > 0 gc, 2
> highprio,
> > 259920
> > > throttled, 15 flows_plimit
> > > qdisc fq
> 8008: root
> > refcnt 2
> > > limit 10000p flow_limit
> > > 100p
> buckets 1024
> > orphan_mask
> > > 1023 quantum 3028
> > >
> initial_quantum
> > 15140
> > > refill_delay 40.0ms
> > > Sent
> 2533167 bytes
> > 6731 pkt
> > > (dropped 0, overlimits 0
> > > requeues
> 0)
> > > backlog
> 0b 0p
> > requeues 0
> > > 24 flows
> (24
> > inactive, 0
> > > throttled)
> > > 0 gc, 2
> highprio,
> > 397
> > > throttled
> > >
> > >
> > > Do you
> have any
> > suggestions?
> > >
> > >
> > > Regards,
> > >
> Hans-Kristian
> > >
> > >
> > >
> > >
> > >
> > _______________________________________________
> > > Bloat mailing list
> > >
> Bloat at lists.bufferbloat.net
> > >
> > https://lists.bufferbloat.net/listinfo/bloat
> > >
> > >
> > >
> > _______________________________________________
> > > Bloat mailing list
> > > Bloat at lists.bufferbloat.net
> > >
> https://lists.bufferbloat.net/listinfo/bloat
> > >
> > >
> >
> >
> >
> >
> >
>
>
>
>
>
More information about the Bloat
mailing list