[Cake] cake exploration
dave.taht at gmail.com
Sat Apr 11 14:48:01 EDT 2015
15) Needs to work so an ISP can create service classes for their customers
DRR 1: cake bandwidth X
DRR 2: cake bandwidth Y
I have no idea whether this can work at all, last I tried it DRR would
stall everytime fq_codel had no packets to deliver.
A related issue is that there needs to be a way to have a tc or
iptables filter to map multiple IPs and addresses so that they return
a single distinct integer for placement into such queue systems
so that a lookup of someone that has x.y.x.z, q.f.b.a/24, j:k:h/64 and
l:m:n/48 can return a single integer representing the customer so it
can be fed into the above correct sub-queuedisc.
I think, but that is not sure, that is all my backlog!
On Sat, Apr 11, 2015 at 11:47 AM, Dave Taht <dave.taht at gmail.com> wrote:
> 14) strict priority queues. Some CBR techniques, notably IPTV, want 0
> packet loss, but run at a rate determined by the provider to be below
> what the subscriber will use. Sharing that "fairly" will lead to loss
> of packets to those applications.
> I do not like strict priority queues. I would prefer, for example,
> that the CBR application be marked with ECN, and ignored, vs the the
> high probability someone will abuse a strict priority queue.
> On Sat, Apr 11, 2015 at 11:45 AM, Dave Taht <dave.taht at gmail.com> wrote:
>> 12) Better starting interval and target for codel´s maintence vars in
>> relationship to existing flows
>> Right now sch_fq, sch_pie give priority to flows in their first IW
>> phases. This makes them vulnerable to DDOS attacks with tons of new
>> sch_fq_codel mitigates this somewhat by starting to hash flows into
>> the same buckets.
>> sch_cake´s more perfect hashing gives IW more of a boost.
>> A thought was to do a combined ewma of all active flows and to hand
>> their current codel settings to new flows as they arrive, with less of
>> a boost.
>> This MIGHT work better when you have short RTTs generally on local
>> networks. Other thoughts appreciated.
>> There is another related problem in the resumption portion of the
>> algorithm as the decay of the existing state variables is arbitrary
>> and way too long in some cases. I think I had solved this by coming up
>> with an estimate for the amount of decay needed other than count - 2,
>> doing a calculation from the last time a flow had packets to the next,
>> but can´t remember how I did it! It is easy if you have a last time
>> per queue and use a normal sqrt with a divide... but my brain crashes
>> at the reciprocal cache math we have instead....
>> I am not allergic to a divide. I am not allergic to using a shift for
>> the target and calculating the interval only relative to bandwidth, as
>> mentioned elsewhere. At 64k worth of bandwidth we just end up with a
>> huge interval, no big deal. But plan to ride along with the two
>> separately for now.
>> 13) It might be possible to write a faster codel - and easier to read
>> by using a case statement on the 2 core variables in it. The current
>> code does not show the 3 way state machine as well as that could, and
>> for all I know there is something intelligent we could do with the 4th
>> On Sat, Apr 11, 2015 at 11:44 AM, Dave Taht <dave.taht at gmail.com> wrote:
>>> Stuff on my backlog of researchy stuff.
>>> 1) cake_drop_monitor - I wanted a way to throw drop AND mark
>>> notifications up to userspace,
>>> including the packet´s time of entry and the time of drop, as well as
>>> the IP headers
>>> and next hop destination macaddr.
>>> There are many use cases for this:
>>> A) - testing the functionality of the algorithm and being able to
>>> collect and analyze drops as they happen.
>>> NET_DROP_MONITOR did not cut it but I have not looked at it in a year.
>>> It drives me crazy to be dropping packets all over the system and to
>>> not be able to track down where they happened.
>>> This is the primary reason why I had switched back to 64 bit timestamps, btw.
>>> B) Having the drop notifications might be useful in tuning or steering
>>> traffic to different routes.
>>> C) It is way easier to do a graph of the drop pattern with this info
>>> thrown to userspace.
>>> 2) Dearly wanted to actually be doing the timestamping AND hashing in
>>> the native skb
>>> struct on entry to the system itself, not the qdisc. Measuring the
>>> latency from ingress from the
>>> wire to egress would result in much better cpu overload behavior. I am
>>> totally aware of
>>> how much mainline linux would not take this option, but things have
>>> evolved over there, so
>>> leveraging the rxhash and skb->timestamp fields seems a possibility...
>>> I think this would let us get along better with netem also, but would
>>> have to go look again.
>>> Call that cake-rxhash. :)
>>> 3) In my benchmark of the latest cake3, ecn traffic was not as good as
>>> expected, but that might have been an anomoly of the test. Need to
>>> test ecn thoroughly this time, almost in preference to looking at drop
>>> behavior. Toke probably has ecn off by default right now. On, after
>>> this test run?
>>> 4) Testing higher rates and looking at cwnd for codel is important.
>>> The dropoff toke noted in his paper is real. Also there is possibly
>>> some ideal ratio between number of flows and bandwidth that makes more
>>> sense than a fixed number of flows. Also I keep harping on the darn
>>> resumption algo... but need to test with lousier tcps like windows.
>>> 5) Byte Mode-ish handling
>>> Dropping a single 64 byte packet does little good. You will find in
>>> the 50 flow tests that a ton of traffic is acks, not being dropped,
>>> and pie does better in this case than does fq, as it shoots
>>> wildly at everything, but usually misses the fat packets, where DRR
>>> will merrily store up an entire
>>> MTU worth of useless acks when only one is needed.
>>> So just trying to drop more little packets might be helpful in some cases.
>>> 6) Ack thinning. I gave what is conventionally called "stretch acks" a
>>> new name, as stretch acks
>>> have a deserved reputation as sucking. Well, they dont suck anymore in
>>> linux, and what I was
>>> mostly thinking was to drop no more than 2 in a row...
>>> One thing this would help with is in packing wifi aggregates - which
>>> have hard limits on the number of packets in a TXOP (42), and a byte
>>> limit on wireless n of 64k. Sending 41 acks from
>>> one flow, when you could send the last 2, seems like a big win on
>>> packing a TXOP.
>>> (this is something eric proposed, and given the drop rates we now see
>>> from wifi and the wild and wooly internet I am inclined to agree that
>>> it is worth fiddling with)
>>> (I am not huge on it, though)
>>> 7) Macaddr hashing on the nexthop instead of the 5tuple. When used on
>>> an internal, switched network, it would be better to try and maximize
>>> the port usage rather than the 5 tuple in some cases.
>>> I have never got around to writing a mac hash I liked, my goal
>>> originally was to write one that found a minimal perfect hash solution
>>> eventually as mac addrs tend to be pretty stable on a network and
>>> rarely change.
>>> Warning: minimal perfect hash attempts are a wet paint thing! I really
>>> want a FPGA solver for them.... dont go play with the code out there,
>>> you will lose days to it... you have been warned.
>>> I would like there to be a generic mac hashing thing in tc, actually.
>>> 8) Parallel FIB lookup
>>> IF you assume that you have tons of queues routing packets from
>>> ingress to egress, on tons of cpus, you can actually do the FIB lookup
>>> in parallel also. There is some old stuff on virtualqueue
>>> and virtual clock fqing which makes for tighter
>>> 9) Need a codel *library* that works at the mac80211 layer. I think
>>> codel*.h sufficies but am not sure. And for that matter, codel itself
>>> seems like it would need a calculated target and a few other thing to
>>> work right on wifi.
>>> As for the hashing...
>>> Personally I do not think that the 8 way set associative has is what
>>> wifi needs for cake, I tend to think we need to "pack" aggregates with
>>> as many different flows as possible, and randomize how we packet
>>> them... I think.... maybe....
>>> 10) I really dont like BQL with multi-queued hardware queues. More
>>> backpressure is needed in that case than we get.
>>> 11) GRO peeling
>>> Offloads suck
>>> Dave Täht
>>> Let's make wifi fast, less jittery and reliable again!
>> Dave Täht
>> Let's make wifi fast, less jittery and reliable again!
> Dave Täht
> Let's make wifi fast, less jittery and reliable again!
Let's make wifi fast, less jittery and reliable again!
More information about the Cake