[Cake] cake exploration

Dave Taht dave.taht at gmail.com
Sat Apr 11 14:45:07 EDT 2015

12) Better starting interval and target for codel´s maintence vars in
relationship to existing flows

Right now sch_fq, sch_pie give priority to flows in their first IW
phases. This makes them vulnerable to DDOS attacks with tons of new

sch_fq_codel mitigates this somewhat by starting to hash flows into
the same buckets.

sch_cake´s more perfect hashing gives IW more of a boost.

A thought was to do a combined ewma of all active flows and to hand
their current codel settings to new flows as they arrive, with less of
a boost.

This MIGHT work better when you have short RTTs generally on local
networks. Other thoughts appreciated.

There is another related problem in the resumption portion of the
algorithm as the decay of the existing state variables is arbitrary
and way too long in some cases. I think I had solved this by coming up
with an estimate for the amount of decay needed other than count - 2,
doing a calculation from the last time a flow had packets to the next,
but can´t remember how I did it! It is easy if you have a last time
per queue and use a normal sqrt with a divide... but my brain crashes
at the reciprocal cache math we have instead....

I am not allergic to a divide. I am not allergic to using a shift for
the target and calculating the interval only relative to bandwidth, as
mentioned elsewhere. At 64k worth of bandwidth we just end up with a
huge interval, no big deal. But plan to ride along with the two
separately for now.

13)  It might be possible to write a faster codel - and easier to read
by using a case statement on the 2 core variables in it. The current
code does not show the 3 way state machine as well as that could, and
for all I know there is something intelligent we could do with the 4th

On Sat, Apr 11, 2015 at 11:44 AM, Dave Taht <dave.taht at gmail.com> wrote:
> Stuff on my backlog of researchy stuff.
> 1) cake_drop_monitor - I wanted a way to throw drop AND mark
> notifications up to userspace,
> including the packet´s time of entry and the time of drop, as well as
> the IP headers
> and next hop destination macaddr.
> There are many use cases for this:
> A)  - testing the functionality of the algorithm and being able to
> collect and analyze drops as  they happen.
> NET_DROP_MONITOR did not cut it but I have not looked at it in a year.
> It drives me crazy to be dropping packets all over the system and to
> not be able to track down where they happened.
> This is the primary reason why I had switched back to 64 bit timestamps, btw.
> B) Having the drop notifications might be useful in tuning or steering
> traffic to different routes.
> C) It is way easier to do a graph of the drop pattern with this info
> thrown to userspace.
> 2) Dearly wanted to actually be doing the timestamping AND hashing in
> the native skb
> struct on entry to the system itself, not the qdisc. Measuring the
> latency from ingress from the
> wire to egress would result in much better cpu overload behavior. I am
> totally aware of
> how much mainline linux would not take this option, but things have
> evolved over there, so
> leveraging the rxhash and skb->timestamp fields seems a possibility...
> I think this would let us get along better with netem also, but would
> have to go look again.
> Call that cake-rxhash. :)
> 3) In my benchmark of the latest cake3, ecn traffic was not as good as
> expected, but that might have been an anomoly of the test. Need to
> test ecn thoroughly this time, almost in preference to looking at drop
> behavior. Toke probably has ecn off by default right now. On, after
> this test run?
> 4) Testing higher rates and looking at cwnd for codel is important.
> The dropoff toke noted in his paper is real. Also there is possibly
> some ideal ratio between number of flows and bandwidth that makes more
> sense than a fixed number of flows. Also I keep harping on the darn
> resumption algo... but need to test with lousier tcps like windows.
> 5) Byte Mode-ish handling
> Dropping a single 64 byte packet does little good. You will find in
> the 50 flow tests that a ton of traffic is acks, not being dropped,
> and pie does better in this case than does fq, as it shoots
> wildly at everything, but usually misses the fat packets, where DRR
> will merrily store up an entire
> MTU worth of useless acks when only one is needed.
> So just trying to drop more little packets might be helpful in some cases.
> 6) Ack thinning. I gave what is conventionally called "stretch acks" a
> new name, as stretch acks
> have a deserved reputation as sucking. Well, they dont suck anymore in
> linux, and what I was
> mostly thinking was to drop no more than 2 in a row...
> One thing this would help with is in packing wifi aggregates - which
> have hard limits on the number of packets in a TXOP (42), and a byte
> limit on wireless n of 64k. Sending 41 acks from
> one flow, when you could send the last 2, seems like a big win on
> packing a TXOP.
> (this is something eric proposed, and given the drop rates we now see
> from wifi and the wild and wooly internet I am inclined to agree that
> it is worth fiddling with)
> (I am not huge on it, though)
> 7) Macaddr hashing on the nexthop instead of the 5tuple. When used on
> an internal, switched network, it  would be better to try and maximize
> the port usage rather than the 5 tuple in some cases.
> I have never got around to writing a mac hash I liked, my goal
> originally was to write one that found a minimal perfect hash solution
> eventually as mac addrs tend to be pretty stable on a network and
> rarely change.
> Warning: minimal perfect hash attempts are a wet paint thing! I really
> want a FPGA solver for them.... dont go play with the code out there,
> you will lose days to it... you have been warned.
> http://cmph.sourceforge.net/concepts.html
> I would like there to be a generic mac hashing thing in tc, actually.
> 8) Parallel FIB lookup
> IF you assume that you have tons of queues routing packets from
> ingress to egress, on tons of cpus, you can actually do the FIB lookup
> in parallel also. There is some old stuff on virtualqueue
> and virtual clock fqing which makes for tighter
> 9) Need a codel *library* that works at the mac80211 layer. I think
> codel*.h sufficies but am not sure. And for that matter, codel itself
> seems like it would need a calculated target and a few other thing to
> work right on wifi.
> As for the hashing...
> Personally I do not think that the 8 way set associative has is what
> wifi needs for cake, I tend to think we need to "pack" aggregates with
> as many different flows as possible, and randomize how we packet
> them... I think.... maybe....
> 10) I really dont like BQL with multi-queued hardware queues. More
> backpressure is needed in that case than we get.
> 11) GRO peeling
> Offloads suck
> --
> Dave Täht
> Let's make wifi fast, less jittery and reliable again!
> https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb

Dave Täht
Let's make wifi fast, less jittery and reliable again!


More information about the Cake mailing list