Re: [Cake] [LEDE-DEV] Cake SQM killing my DIR-860L

* Re: [Cake] [LEDE-DEV] Cake SQM killing my DIR-860L - was: [17.01] Kernel: bump to 4.4.51
       [not found]       ` <1488484262.16753.0@smtp.autistici.org>
@ 2017-03-02 21:10         ` Dave Täht
  2017-03-02 23:16           ` John Yates
  2017-03-02 23:55           ` John Yates
  0 siblings, 2 replies; 14+ messages in thread
From: Dave Täht @ 2017-03-02 21:10 UTC (permalink / raw)
  To: lede-dev, cake

On 3/2/17 11:51 AM, Stijn Segers wrote:
> Thanks Sebastian, turned out to be a silly syntax error, I have it all
> disabled now. Ethtool -k and ethtool -K printing/requiring different
> stuff doesn't help of course :-)
> 
> I re-enabled SQM, will see how that works out with the offloading disabled.

Would be good to know. I lost a bit of sleep lately (given how badly we
got bit by RCU on the ATF front, I worry about cake... but I can't see
how that would break, there.)

In terms of general "why does shaping use so much cpu"...

I am keen to stress that the core fq_codel algorithm is very lightweight
and barely shows up on traces when used without software rate limiting
and with BQL.

You CAN see a difference in forwarding performance at really high native
rates if you use pfifo and compare it to fq_codel on some platforms -
pfifo-fast is simpler overall. To experiment, you can re-enable
pfifo-fast in scenarios if you want - (tc qdisc add dev whatever pfifo
limit somethingsane, or bfifo something sane)

... however things like nat and firewall rules tend to dominate the
forwarding costs, and fq_codel reduces latency muchly over pfifo), and
the principal use of fq_codel is for sqm (and now wifi).

As for software rate shaping - this is very cpu intensive no matter how
you do it. I wish we didn't have to do it - and with certain (mostly old
DSL) modems that do flow control you don't.

The only one I know that gets this right is the transverse geode that
david woodhouse has. One of my disappointments across the industry is
not seeing BQL roll out universally on any dsl firmwares, starting, oh,
5 years ago.

If we had ethernet devices with a programmable timer (only interrupt me
on 40mbit rate) we could also completely eliminate software rate shaping....

anyway my benchmarks are showing that:

cake in it's "besteffort" mode smokes HTB + fq_codel, affording
over 40% more headroom in terms of cpu with bandwidth. (Independent
confirmation across more cpu types is need)

In the default mode, with the new 3 tier classification, wash, nat and
triple-isolate/dual-host/dual-src features - which we hope are going to
help folk deal with torrent better in particular - it's a wash.

cake is a LOT more cpu intense than fq_codel is, especially in its
default modes, which it makes up for by being more unified. Mostly.

If you are running low on cpu and are trying to shape inbound on most of
these low-end mips devices to speeds > 60Mbits, I'd highly recommend
switching to using "besteffort" on that rather than the 3 QoS queue
default. Most ISPs are not classifying traffic well, anyway, and FQ
solves nearly everything, especially per host fq....

But none of what I just said applies if there's a bug somewhere else!
GRO has given me fits for years now, and I'm scarred by that.

In terms of cpu costs in cake/fq_codel - dequeue, hashing, and
timestamping show up most on a trace. The rate limiting effort where all
that is happening shows up in softirq dominating the platform.

I have *always* worried that there exists devices (particularly
multi-cores) without a first class high speed internal clock facility,
but thus far haven't had an issue with it (unlike on BSD, which has
internal timings good to only a ms).

As for speeding up hashing, I've been looking over various algorithms to
do that for years now, I'm open to suggestions. The fastest new ones
tend to depend on co-processor support. The fastest I've seen relies on
the CRC32 instruction which is only in some intel platforms.

Cake could certainly use a big round of profiling but it is generally my
hope that we won big with it, in its present form.

I welcome (especially flent) benchmarks of sqm on various architectures
we've not explored fully - notably arm ones -

My hat is off to all that have worked so hard to make this subsystem -
and also all of lede - work so well, in this release.

> Cheers
> 
> Stijn
> 
> 
> _______________________________________________
> Lede-dev mailing list
> Lede-dev@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/lede-dev

^ permalink raw reply	[flat|nested] 14+ messages in thread