Thanks for the replies guys.

I'm using a desktop class machine with a Ryzen Threadripper PRO 3945WX
(12 cores/24 threads).  - It's really not fast enough for what I was
attempting.  When you are used to using more powerful machines at
work, it's easy to forget how powerful server class machines are.

I tried creating many flows using a combination of tools, but this
just saturates all the cores, causing RTTs to spike due to CPU
contention.  The idea was to simulate lots of flows like you might
have at a conference, but I'm going to need more and more powerful
machines.

This is the mq-cake config

  Flows: fping=1, iperf2=300, wrk=100, dnsperf=20, flent=1, crusader=1
  Qdisc: mq-cake
    ixgbe0:
      qdisc mq 1: root
      qdisc cake 8005: parent 1:3 bandwidth 10Gbit diffserv4
triple-isolate nat wash no-ack-filter split-gso rtt 100ms raw overhead
0
      qdisc cake 8003: parent 1:1 bandwidth 10Gbit diffserv4
triple-isolate nat wash no-ack-filter split-gso rtt 100ms raw overhead
0
      qdisc cake 8007: parent 1:5 bandwidth 10Gbit diffserv4
triple-isolate nat wash no-ack-filter split-gso rtt 100ms raw overhead
0
      qdisc cake 8009: parent 1:7 bandwidth 10Gbit diffserv4
triple-isolate nat wash no-ack-filter split-gso rtt 100ms raw overhead
0
      qdisc cake 8004: parent 1:2 bandwidth 10Gbit diffserv4
triple-isolate nat wash no-ack-filter split-gso rtt 100ms raw overhead
0
      qdisc cake 8006: parent 1:4 bandwidth 10Gbit diffserv4
triple-isolate nat wash no-ack-filter split-gso rtt 100ms raw overhead
0
      qdisc cake 800a: parent 1:8 bandwidth 10Gbit diffserv4
triple-isolate nat wash no-ack-filter split-gso rtt 100ms raw overhead
0
      qdisc cake 8008: parent 1:6 bandwidth 10Gbit diffserv4
triple-isolate nat wash no-ack-filter split-gso rtt 100ms raw overhead
0
    ixgbe1:
      qdisc mq 1: root
      qdisc cake 800b: parent 1:1 bandwidth 10Gbit diffserv4
triple-isolate nat wash no-ack-filter split-gso rtt 100ms raw overhead
0
      qdisc cake 800f: parent 1:5 bandwidth 10Gbit diffserv4
triple-isolate nat wash no-ack-filter split-gso rtt 100ms raw overhead
0
      qdisc cake 800d: parent 1:3 bandwidth 10Gbit diffserv4
triple-isolate nat wash no-ack-filter split-gso rtt 100ms raw overhead
0
      qdisc cake 8011: parent 1:7 bandwidth 10Gbit diffserv4
triple-isolate nat wash no-ack-filter split-gso rtt 100ms raw overhead
0
      qdisc cake 800c: parent 1:2 bandwidth 10Gbit diffserv4
triple-isolate nat wash no-ack-filter split-gso rtt 100ms raw overhead
0
      qdisc cake 800e: parent 1:4 bandwidth 10Gbit diffserv4
triple-isolate nat wash no-ack-filter split-gso rtt 100ms raw overhead
0
      qdisc cake 8010: parent 1:6 bandwidth 10Gbit diffserv4
triple-isolate nat wash no-ack-filter split-gso rtt 100ms raw overhead
0
      qdisc cake 8012: parent 1:8 bandwidth 10Gbit diffserv4
triple-isolate nat wash no-ack-filter split-gso rtt 100ms raw overhead
0

[das@l2:~/nixos/desktop/l2]$ uname -a
Linux l2 6.12.68 #1-NixOS SMP PREEMPT_DYNAMIC Fri Jan 30 09:28:49 UTC
2026 x86_64 GNU/Linux

Thanks for the idea of using mq and fq_codel, that's a very interesting idea.

Not sure if this diagram will come through in email, cos it doesn't
look good in this interface:

┌─────────────────────────┐ ┌─────────────────────────┐
┌─────────────────────────┐
│ ns-gen-a │ │ ns-dut │ │ ns-gen-b │
│ (Load Generator) │ │ (Device Under Test) │ │ (Server) │
│ │ │ │ │ │
│ ┌───────────────────┐ │ │ ┌───────────────────┐ │ │ ┌───────────────────┐ │
│ │ enp35s0f0np0 │ │ │ │ ixgbe0 │ │ │ │ enp35s0f1np1 │ │
│ │ Intel X710 p0 │──╋───────╋──│ Intel 82599 p0 │ │ │ │ Intel X710 p1 │ │
│ │ 10.1.0.2/24 │ │ SFP+ │ │ 10.1.0.1/24 │ │ │ │ 10.2.0.2/24 │ │
│ └───────────────────┘ │ Cable │ └─────────┬─────────┘ │ │
└─────────┬─────────┘ │
│ │ │ │ │ │ │ │
│ netem: 30ms ±3ms │ │ ┌──────┴──────┐ │ │ netem: 30ms ±3ms │
│ │ │ │ Forwarding │ │ │ │ │
│ Tools: │ │ │ (ip_forward)│ │ │ Services: │ │
│ - iperf2 client │ │ └──────┬──────┘ │ │ - iperf2 │ │
│ - iperf3 client │ │ │ │ │ - iperf3 │ │
│ - wrk (HTTP) │ │ ┌─────────┴─────────┐ │ │ - flent │ │
│ - dnsperf │ │ │ ixgbe1 │ │ │ - crusader│ │
│ - flent │ │ │ Intel 82599 p1 │──╋───────╋──- nginx │ │
│ - crusader client │ │ │ 10.2.0.1/24 │ │ SFP+ │ - PowerDNS│ │
│ - fping │ │ └───────────────────┘ │ Cable │ │ │
│ │ │ │ │ │ │
└─────────────────────────┘ │ Qdisc under test: │ └────────────┴────────────┘
│ - fq_codel │
│ - cake │
│ - mq-cake (mq+cake) │
│ │
└─────────────────────────┘

On Tue, Feb 17, 2026 at 8:32 AM Toke Høiland-Jørgensen <toke@toke.dk> wrote:
>
> Stephen Hemminger <stephen@networkplumber.org> writes:
>
> > On Tue, 17 Feb 2026 14:23:52 +0100
> > Toke Høiland-Jørgensen via Cake <cake@lists.bufferbloat.net> wrote:
> >
> >> dave seddon <dave.seddon.ca@gmail.com> writes:
> >>
> >> > === Pre-flight Complete ===
> >> > Running 6 test points
> >> >
> >> > [1/6] qdisc=fq_codel flows=1 tool=iperf2
> >> >   Switching qdisc to fq_codel...
> >> >   Throughput: 9.41 Gbps
> >> > [2/6] qdisc=fq_codel flows=10 tool=iperf2
> >> >   Throughput: 9.43 Gbps
> >> > [3/6] qdisc=cake flows=1 tool=iperf2
> >> >   Switching qdisc to cake...
> >> >   Throughput: 6.93 Gbps
> >> > [4/6] qdisc=cake flows=10 tool=iperf2
> >> >   Throughput: 4.37 Gbps                                 <---- cake
> >> > [5/6] qdisc=mq-cake flows=1 tool=iperf2
> >> >   Switching qdisc to mq-cake...
> >> >   Throughput: 7.17 Gbps
> >> > [6/6] qdisc=mq-cake flows=10 tool=iperf2
> >> >   Throughput: 9.44 Gbps                                   <-----
> >> > mq-cake ... Actually, that's interesting.  higher than fq_codel
> >>
> >> Are you running fq_codel as the root qdisc? Because in that case you're
> >> running through the single qdisc lock, which could explain the
> >> difference. Try running separate fq_codel instances beneath an 'mq'
> >> qdisc as the root.
> >>
> >> Also, if you're not setting a shaping rate, cake_mq is basically the
> >> same as just installing an mq qdisc at the root and having separate cake
> >> instances beneath that. So to test the multi-core shaper algorithm
> >> you'll need to set a rate ('bandwidth' parameter).
> >>
> >
> > This is what the OpenWrt SQM scripts seem to favor.
> > I probably need to tune the adaption overhead
> >
> > # tc qdisc show dev wan
> > qdisc htb 1: root refcnt 2 r2q 10 default 0x12 direct_packets_stat 0 direct_qlen 1000
> > qdisc fq_codel 120: parent 1:12 limit 1001p flows 1024 quantum 300 target 5ms interval 100ms memory_limit 4Mb drop_batch 64
> > qdisc fq_codel 130: parent 1:13 limit 1001p flows 1024 quantum 300 target 5ms interval 100ms memory_limit 4Mb drop_batch 64
> > qdisc fq_codel 110: parent 1:11 limit 1001p flows 1024 quantum 300
> > target 5ms interval 100ms memory_limit 4Mb drop_batch 64
>
> If you set 'script' to 'layer_cake.qos' or 'piece_of_cake.qos' in the
> config, you'll get a CAKE setup instead. On OpenWrt 25.12 (still in -rc)
> this will automatically select cake_mq where the system supports it.
>
> > It is dumb that in US it is standard to offer 1G down and 10M up!
>
> Yes, very. Don't think it's actually possible to use that 1G link with
> TCP without ACK thinning of some kind. The full-MTU-to-ACK ratio is ~47
> (1500/32; this is assuming one 64-byte ACK for every two 1500-byte
> packets), so you can only use ~470 Mbps downstream with 10M up. CAKE's
> ACK thinning should help here, but you need to turn that on explicitly
> by adding 'ack-filter' to eqdisc_opts in the config.
>
> -Toke



-- 
Regards,
Dave Seddon
+1 415 857 5102