After some more testing I see that if I disable fq pacing the performance is restored to the expected levels: 
# for i in eth0 eth1; do tc qdisc replace dev $i root fq nopacing; done

Is this expected behaviour? There is some background traffic, but only in the sub 100 mbit/s on the switches and gateway between the server and client.

The chain:
Windows 10 client -> 1000 mbit/s -> switch -> 2xgigabit LACP -> switch -> 4 x gigabit LACP -> gw (fq_codel on all nics) -> 4 x gigabit LACP (the same as in) -> switch -> 2 x lacp -> server (with misbehaving fq pacing)


On 26 January 2017 at 19:38, Hans-Kristian Bakke <hkbakke@gmail.com> wrote:
I can add that this is without BBR, just plain old kernel 4.8 cubic.

On 26 January 2017 at 19:36, Hans-Kristian Bakke <hkbakke@gmail.com> wrote:
Another day, another fq issue (or user error).

I try to do the seeminlig simple task of downloading a single large file over local gigabit  LAN from a physical server running kernel 4.8 and sch_fq on intel server NICs.

For some reason it wouldn't go past around 25 MB/s. After having replaced SSL with no SSL, replaced apache with nginx and verified that there is plenty of bandwith available between my client and the server I tried to change qdisc from fq to pfifo_fast. It instantly shot up to around the expected 85-90 MB/s. The same happened with fq_codel in place of fq.

I then checked the statistics for fq and the throttled counter is increasing massively every second (eth0 and eth1 is LACPed using Linux bonding so both is seen here):

qdisc fq 8007: root refcnt 2 limit 10000p flow_limit 100p buckets 1024 orphan_mask 1023 quantum 3028 initial_quantum 15140 refill_delay 40.0ms
 Sent 787131797 bytes 520082 pkt (dropped 15, overlimits 0 requeues 0)
 backlog 98410b 65p requeues 0
  15 flows (14 inactive, 1 throttled)
  0 gc, 2 highprio, 259920 throttled, 15 flows_plimit
qdisc fq 8008: root refcnt 2 limit 10000p flow_limit 100p buckets 1024 orphan_mask 1023 quantum 3028 initial_quantum 15140 refill_delay 40.0ms
 Sent 2533167 bytes 6731 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  24 flows (24 inactive, 0 throttled)
  0 gc, 2 highprio, 397 throttled

Do you have any suggestions?

Regards,
Hans-Kristian