Cake - FQ_codel the next generation
 help / color / mirror / Atom feed
* [Cake] Multiple Hardware Queues
@ 2018-07-15  7:41 dag dg
  2018-07-15  8:10 ` Jonathan Morton
  0 siblings, 1 reply; 7+ messages in thread
From: dag dg @ 2018-07-15  7:41 UTC (permalink / raw)
  To: Cake

Firstly let me give my congratulations to the contributors of the Cake
project for Cake being accepted upstream. I've been following the
project for awhile and greatly appreciate the effort that has been put
into it.

Ironically I just wrapped up throwing some unofficial packages
together for Fedora 28 to enable cake support; having it upstream will
make updates a lot easier.

Now that I have cake available and running I just wanted to do one
final check on a technical consideration I had brought up on the
bufferbloat list a few months back, before I lay this notion to rest.

Toke gave me some guidance at that time which helped point me towards
cake. Now that I have it running I wanted to check in one last time to
see if there's any beneficial way I can use cake with multiple
hardware queues or if I need to just give up the chase.

In my box I have acting as a router I have an Intel i350-t2v2 nic that
has two gigabit ports(uplink/local). This card and its corresponding
driver supports multiple hardware-based transmit and receive queues
depending on the number of cores the system has up to 8.

without cake:
qdisc mq 0: dev enp2s0f0 root
qdisc fq_codel 0: dev enp2s0f0 parent :8 limit 10240p flows 1024
quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev enp2s0f0 parent :7 limit 10240p flows 1024
quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev enp2s0f0 parent :6 limit 10240p flows 1024
quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev enp2s0f0 parent :5 limit 10240p flows 1024
quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev enp2s0f0 parent :4 limit 10240p flows 1024
quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev enp2s0f0 parent :3 limit 10240p flows 1024
quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev enp2s0f0 parent :2 limit 10240p flows 1024
quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev enp2s0f0 parent :1 limit 10240p flows 1024
quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc mq 0: dev enp2s0f1 root
qdisc fq_codel 0: dev enp2s0f1 parent :8 limit 10240p flows 1024
quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev enp2s0f1 parent :7 limit 10240p flows 1024
quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev enp2s0f1 parent :6 limit 10240p flows 1024
quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev enp2s0f1 parent :5 limit 10240p flows 1024
quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev enp2s0f1 parent :4 limit 10240p flows 1024
quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev enp2s0f1 parent :3 limit 10240p flows 1024
quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev enp2s0f1 parent :2 limit 10240p flows 1024
quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev enp2s0f1 parent :1 limit 10240p flows 1024
quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn

with cake via sqm:
qdisc cake 802c: dev enp2s0f0 root refcnt 9 bandwidth 23Mbit diffserv3
triple-isolate split-gso rtt 100.0ms raw overhead 0
qdisc ingress ffff: dev enp2s0f0 parent ffff:fff1 ----------------
qdisc mq 0: dev enp2s0f1 root
qdisc fq_codel 0: dev enp2s0f1 parent :8 limit 10240p flows 1024
quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev enp2s0f1 parent :7 limit 10240p flows 1024
quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev enp2s0f1 parent :6 limit 10240p flows 1024
quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev enp2s0f1 parent :5 limit 10240p flows 1024
quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev enp2s0f1 parent :4 limit 10240p flows 1024
quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev enp2s0f1 parent :3 limit 10240p flows 1024
quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev enp2s0f1 parent :2 limit 10240p flows 1024
quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev enp2s0f1 parent :1 limit 10240p flows 1024
quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev tun0 root refcnt 2 limit 10240p flows 1024
quantum 1500 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc cake 802d: dev ifb4enp2s0f0 root refcnt 2 bandwidth 330Mbit
besteffort triple-isolate wash split-gso rtt 100.0ms raw overhead 0

Let me be clear that with cake and sqm I am seeing great results on
the dslreports speed test(A+) so this inquiry is less about solving a
problem and more along the lines of trying to take full advantage of
my available hardware. Any insight would be appreciated, and thanks
again for your contributions.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Cake] Multiple Hardware Queues
  2018-07-15  7:41 [Cake] Multiple Hardware Queues dag dg
@ 2018-07-15  8:10 ` Jonathan Morton
  2018-07-15 10:09   ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 7+ messages in thread
From: Jonathan Morton @ 2018-07-15  8:10 UTC (permalink / raw)
  To: dag dg; +Cc: Cake

> On 15 Jul, 2018, at 10:41 am, dag dg <dagofthedofg@gmail.com> wrote:
> 
> In my box I have acting as a router I have an Intel i350-t2v2 nic that
> has two gigabit ports(uplink/local). This card and its corresponding
> driver supports multiple hardware-based transmit and receive queues
> depending on the number of cores the system has up to 8.

> qdisc cake 802c: dev enp2s0f0 root refcnt 9 bandwidth 23Mbit diffserv3 triple-isolate split-gso rtt 100.0ms raw overhead 0

> qdisc cake 802d: dev ifb4enp2s0f0 root refcnt 2 bandwidth 330Mbit besteffort triple-isolate wash split-gso rtt 100.0ms raw overhead 0

> Let me be clear that with cake and sqm I am seeing great results on
> the dslreports speed test(A+) so this inquiry is less about solving a
> problem and more along the lines of trying to take full advantage of
> my available hardware. Any insight would be appreciated, and thanks
> again for your contributions.

At these bandwidths, you are not stressing your hardware at all - and I don't even have to ask what CPU you have to know this.  The NIC's multiple queues would give you no benefit whatsoever.  An Intel Atom or an AMD E-450 can easily handle gigabit traffic through Cake, as long as the NIC is attached via a bus capable of carrying that much bandwidth (eg. PCIe).  These are some of the least powerful 64-bit x86 CPUs that ever reached the market.

In any case, the MQ qdisc simply sorts packets into hardware queues according to the CPU they were submitted from.  This is useful for something like a heavily loaded webserver, which has many worker processes distributed evenly across all available CPUs, since it avoids either passing data to a NIC-worker process on a fixed CPU, or contending for a single NIC lock.  But it's basically useless on a desktop PC, or on a machine acting primarily as a router, since the traffic is submitted from just one or two CPUs at a time, and usually most of the CPUs are idle anyway.  I have no idea what the hardware does to coalesce those packets into a single stream to be sent over the wire.

You can verify this for yourself by looking at your CPU load while running a full bidirectional speed test.  On any recent CPU, expect to see the most loaded core with at least 90% idle - unless your web browser happens to be occupying it.

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Cake] Multiple Hardware Queues
  2018-07-15  8:10 ` Jonathan Morton
@ 2018-07-15 10:09   ` Toke Høiland-Jørgensen
  2018-07-15 15:57     ` Dave Taht
  0 siblings, 1 reply; 7+ messages in thread
From: Toke Høiland-Jørgensen @ 2018-07-15 10:09 UTC (permalink / raw)
  To: Jonathan Morton, dag dg; +Cc: Cake

Yeah, I agree that at 1 Gbit you don't need multiple receive queues to
get to line rate. In my 100Gbit tests, I got to 50 Gbps with CAKE (I
should really post some graphs of that), so at really high speeds we
would benefit from being able to run simultaneously on multiple CPUs.
But let's just say that turning CAKE into something that can run on
multiple CPUs simultaneously is non-trivial... :)

> In any case, the MQ qdisc simply sorts packets into hardware queues
> according to the CPU they were submitted from. [...] But it's
> basically useless on [...] a machine acting primarily as a router,
> since the traffic is submitted from just one or two CPUs at a time,
> and usually most of the CPUs are idle anyway.

Not quite. On a router, the distribution of packets over CPUs will
depend on what happens on the receive side. Usually, the hardware will
have the same number of receive queues as transmit queues, and it will
use Receive Side Scaling (RSS) which hashes packets into the queues
based on the packet header. Often, the hardware queues are not assigned
properly to different CPUs, which is why the first thing 10Gbit+
performance tuning guides tells you to do is to adjust the CPU mapping
of the hardware queue IRQs...

> I have no idea what the hardware does to coalesce those packets into a
> single stream to be sent over the wire.

That's hardware specific, but I think most devices do something that
more or less corresponds to round-robin scheduling of the hardware
queues.

-Toke

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Cake] Multiple Hardware Queues
  2018-07-15 10:09   ` Toke Høiland-Jørgensen
@ 2018-07-15 15:57     ` Dave Taht
  2018-07-15 20:28       ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Taht @ 2018-07-15 15:57 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: Jonathan Morton, dagofthedofg, Cake List

I note that I like the idea of cake-mq (or rather, cake-smp). If we
can currently achieve 50gbit bottlenecking on one cpu, what can be
done to get past 100gbit?

tc qdisc add dev eth root cake-smp bandwidth 100gbit

has a nice ring to it don't you think? :)

* BQL's estimator is essentially additive. If you have 64 hw queues
(common in 10gige hw), you've got a ton of inessential latency that
builds up there due to bulking.

* I have grave doubts about 64k aqm'd queues at any speed. Better to
have less queues.

* In a naive parallel implementation (and excluding some painful
implementation details) - only the global shaped bandwidth limit has a
need for atomic, cross cpu access, and even that can essentially be
rcu'd.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Cake] Multiple Hardware Queues
  2018-07-15 15:57     ` Dave Taht
@ 2018-07-15 20:28       ` Toke Høiland-Jørgensen
  2018-07-16 19:03         ` dag dg
  0 siblings, 1 reply; 7+ messages in thread
From: Toke Høiland-Jørgensen @ 2018-07-15 20:28 UTC (permalink / raw)
  To: Dave Taht; +Cc: Jonathan Morton, dagofthedofg, Cake List

Dave Taht <dave.taht@gmail.com> writes:

> excluding some painful implementation details

Heh...

-Toke

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Cake] Multiple Hardware Queues
  2018-07-15 20:28       ` Toke Høiland-Jørgensen
@ 2018-07-16 19:03         ` dag dg
  2018-07-16 19:27           ` Georgios Amanakis
  0 siblings, 1 reply; 7+ messages in thread
From: dag dg @ 2018-07-16 19:03 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: Dave Taht, Jonathan Morton, Cake List

(sorry for the spam Toke, still getting used to a new email client)

Thanks for the input, this is pretty much the info I was hoping for.
At this point I'll probably swap out the dual port NIC I have with
something more reasonable and put the i350 to work where its
performance will actually be used.

Really excited to have cake support in upstream.

Looking at usage I noticed there were a few cake configuration
parameters I may have missed when setting up my sqm-scripts config.
Right now I'm just using the "piece-of-cake.qos" example, with my
interface config file as:

# Uplink and Downlink values are in kbps
UPLINK=23000
DOWNLINK=330000

# SQM recipe to use. For more information, see /usr/lib/sqm/*.help
SCRIPT=piece_of_cake.qos

# Optional/advanced config

#ENABLED=1
#QDISC=cake

#LLAM=tc_stab
#LINKLAYER=none
#OVERHEAD=0
#STAB_MTU=2047
#STAB_TSIZE=512
#STAB_MPU=0

#ILIMIT=
#ELIMIT=
#ITARGET=
#ETARGET=

# ECN ingress resp. egress. Values are ECN or NOECN.
#IECN=ECN
#EECN=ECN

# Extra qdisc options ingress resp. egress
IQDISC_OPTS="nat docsis ingress"
EQDISC_OPTS="nat docsis ack-filter"

# CoDel target
#TARGET=5ms

#ZERO_DSCP_INGRESS=1
#IGNORE_DSCP_INGRESS=1

With the result being:

qdisc noqueue 0: dev lo root refcnt 2
qdisc fq_codel 0: dev enp6s0 root refcnt 2 limit 10240p flows 1024
quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc cake 803b: dev enp2s0f0 root refcnt 9 bandwidth 23Mbit
besteffort triple-isolate nat ack-filter split-gso rtt 100.0ms noatm
overhead 18 mpu 64
qdisc ingress ffff: dev enp2s0f0 parent ffff:fff1 ----------------
qdisc mq 0: dev enp2s0f1 root
qdisc fq_codel 0: dev enp2s0f1 parent :8 limit 10240p flows 1024
quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev enp2s0f1 parent :7 limit 10240p flows 1024
quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev enp2s0f1 parent :6 limit 10240p flows 1024
quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev enp2s0f1 parent :5 limit 10240p flows 1024
quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev enp2s0f1 parent :4 limit 10240p flows 1024
quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev enp2s0f1 parent :3 limit 10240p flows 1024
quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev enp2s0f1 parent :2 limit 10240p flows 1024
quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev enp2s0f1 parent :1 limit 10240p flows 1024
quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc fq_codel 0: dev tun0 root refcnt 2 limit 10240p flows 1024
quantum 1500 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
qdisc cake 803c: dev ifb4enp2s0f0 root refcnt 2 bandwidth 330Mbit
besteffort triple-isolate nat wash ingress split-gso rtt 100.0ms noatm
overhead 18 mpu 64

I don't fully understand the docsis option, is it supposed to show up
under the tc qdisc show or is it all in the background? I'm just
worried I'm not setting this up properly.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Cake] Multiple Hardware Queues
  2018-07-16 19:03         ` dag dg
@ 2018-07-16 19:27           ` Georgios Amanakis
  0 siblings, 0 replies; 7+ messages in thread
From: Georgios Amanakis @ 2018-07-16 19:27 UTC (permalink / raw)
  To: dag dg; +Cc: Toke Høiland-Jørgensen, Cake List

[-- Attachment #1: Type: text/plain, Size: 3594 bytes --]

The docsis option shows up in TC qdisc show as: noatm overhead 18 mpu 64.
So I think you are ok.

On Mon, Jul 16, 2018, 3:03 PM dag dg <dagofthedofg@gmail.com> wrote:

> (sorry for the spam Toke, still getting used to a new email client)
>
> Thanks for the input, this is pretty much the info I was hoping for.
> At this point I'll probably swap out the dual port NIC I have with
> something more reasonable and put the i350 to work where its
> performance will actually be used.
>
> Really excited to have cake support in upstream.
>
> Looking at usage I noticed there were a few cake configuration
> parameters I may have missed when setting up my sqm-scripts config.
> Right now I'm just using the "piece-of-cake.qos" example, with my
> interface config file as:
>
> # Uplink and Downlink values are in kbps
> UPLINK=23000
> DOWNLINK=330000
>
> # SQM recipe to use. For more information, see /usr/lib/sqm/*.help
> SCRIPT=piece_of_cake.qos
>
> # Optional/advanced config
>
> #ENABLED=1
> #QDISC=cake
>
> #LLAM=tc_stab
> #LINKLAYER=none
> #OVERHEAD=0
> #STAB_MTU=2047
> #STAB_TSIZE=512
> #STAB_MPU=0
>
> #ILIMIT=
> #ELIMIT=
> #ITARGET=
> #ETARGET=
>
> # ECN ingress resp. egress. Values are ECN or NOECN.
> #IECN=ECN
> #EECN=ECN
>
> # Extra qdisc options ingress resp. egress
> IQDISC_OPTS="nat docsis ingress"
> EQDISC_OPTS="nat docsis ack-filter"
>
> # CoDel target
> #TARGET=5ms
>
> #ZERO_DSCP_INGRESS=1
> #IGNORE_DSCP_INGRESS=1
>
> With the result being:
>
> qdisc noqueue 0: dev lo root refcnt 2
> qdisc fq_codel 0: dev enp6s0 root refcnt 2 limit 10240p flows 1024
> quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
> qdisc cake 803b: dev enp2s0f0 root refcnt 9 bandwidth 23Mbit
> besteffort triple-isolate nat ack-filter split-gso rtt 100.0ms noatm
> overhead 18 mpu 64
> qdisc ingress ffff: dev enp2s0f0 parent ffff:fff1 ----------------
> qdisc mq 0: dev enp2s0f1 root
> qdisc fq_codel 0: dev enp2s0f1 parent :8 limit 10240p flows 1024
> quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
> qdisc fq_codel 0: dev enp2s0f1 parent :7 limit 10240p flows 1024
> quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
> qdisc fq_codel 0: dev enp2s0f1 parent :6 limit 10240p flows 1024
> quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
> qdisc fq_codel 0: dev enp2s0f1 parent :5 limit 10240p flows 1024
> quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
> qdisc fq_codel 0: dev enp2s0f1 parent :4 limit 10240p flows 1024
> quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
> qdisc fq_codel 0: dev enp2s0f1 parent :3 limit 10240p flows 1024
> quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
> qdisc fq_codel 0: dev enp2s0f1 parent :2 limit 10240p flows 1024
> quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
> qdisc fq_codel 0: dev enp2s0f1 parent :1 limit 10240p flows 1024
> quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
> qdisc fq_codel 0: dev tun0 root refcnt 2 limit 10240p flows 1024
> quantum 1500 target 5.0ms interval 100.0ms memory_limit 32Mb ecn
> qdisc cake 803c: dev ifb4enp2s0f0 root refcnt 2 bandwidth 330Mbit
> besteffort triple-isolate nat wash ingress split-gso rtt 100.0ms noatm
> overhead 18 mpu 64
>
> I don't fully understand the docsis option, is it supposed to show up
> under the tc qdisc show or is it all in the background? I'm just
> worried I'm not setting this up properly.
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake
>

[-- Attachment #2: Type: text/html, Size: 4359 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-07-16 19:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-15  7:41 [Cake] Multiple Hardware Queues dag dg
2018-07-15  8:10 ` Jonathan Morton
2018-07-15 10:09   ` Toke Høiland-Jørgensen
2018-07-15 15:57     ` Dave Taht
2018-07-15 20:28       ` Toke Høiland-Jørgensen
2018-07-16 19:03         ` dag dg
2018-07-16 19:27           ` Georgios Amanakis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox