From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk0-x22a.google.com (mail-qk0-x22a.google.com [IPv6:2607:f8b0:400d:c09::22a]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 6801821F310 for ; Sat, 11 Apr 2015 11:47:53 -0700 (PDT) Received: by qkgx75 with SMTP id x75so90652301qkg.1 for ; Sat, 11 Apr 2015 11:47:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=NzzdXyXZaIDqAvdT/WeUjfbbpaD+DPa8U9ZbSy8+BzA=; b=JYqXEMp/A82pBS7qIxCYfFoOGxMF6Wkkgo3ZJwNghjAzlso9dwCEkJiGOnvdw6IUNG wlT3Ck9C62BHKvqUusUM23HO8r9u4g9eH2p+t6jOpv9ctQTsawmOawRsqiaBdhYW/6UJ X6mm8WGpvtwIgLJ2hS4PjspWWrP4y2ZPGBxZpo9+jxy/ZqkSjBeCbhuuWAymjffZBNBZ 3MTygqLskL5JUOTGOr5bZCePkjZE4Pf4NgXFef2bO4s5ziNoX4Wtk87MMrSnPhxBk1YE /1Z+6a/4YPoo4oWAmIsDTbreuvOrMG78r0S8KMxCj8IzPF+GwLZ9+FI6XbIsK/SFdfqv nZoQ== MIME-Version: 1.0 X-Received: by 10.202.216.87 with SMTP id p84mr2554472oig.133.1428778073082; Sat, 11 Apr 2015 11:47:53 -0700 (PDT) Received: by 10.202.51.66 with HTTP; Sat, 11 Apr 2015 11:47:53 -0700 (PDT) In-Reply-To: References: Date: Sat, 11 Apr 2015 11:47:53 -0700 Message-ID: From: Dave Taht To: cake@lists.bufferbloat.net Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Cake] cake exploration X-BeenThere: cake@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Cake - FQ_codel the next generation List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 11 Apr 2015 18:48:22 -0000 14) strict priority queues. Some CBR techniques, notably IPTV, want 0 packet loss, but run at a rate determined by the provider to be below what the subscriber will use. Sharing that "fairly" will lead to loss of packets to those applications. I do not like strict priority queues. I would prefer, for example, that the CBR application be marked with ECN, and ignored, vs the the high probability someone will abuse a strict priority queue. On Sat, Apr 11, 2015 at 11:45 AM, Dave Taht wrote: > 12) Better starting interval and target for codel=C2=B4s maintence vars i= n > relationship to existing flows > > Right now sch_fq, sch_pie give priority to flows in their first IW > phases. This makes them vulnerable to DDOS attacks with tons of new > flows. > > sch_fq_codel mitigates this somewhat by starting to hash flows into > the same buckets. > > sch_cake=C2=B4s more perfect hashing gives IW more of a boost. > > A thought was to do a combined ewma of all active flows and to hand > their current codel settings to new flows as they arrive, with less of > a boost. > > This MIGHT work better when you have short RTTs generally on local > networks. Other thoughts appreciated. > > There is another related problem in the resumption portion of the > algorithm as the decay of the existing state variables is arbitrary > and way too long in some cases. I think I had solved this by coming up > with an estimate for the amount of decay needed other than count - 2, > doing a calculation from the last time a flow had packets to the next, > but can=C2=B4t remember how I did it! It is easy if you have a last time > per queue and use a normal sqrt with a divide... but my brain crashes > at the reciprocal cache math we have instead.... > > I am not allergic to a divide. I am not allergic to using a shift for > the target and calculating the interval only relative to bandwidth, as > mentioned elsewhere. At 64k worth of bandwidth we just end up with a > huge interval, no big deal. But plan to ride along with the two > separately for now. > > 13) It might be possible to write a faster codel - and easier to read > by using a case statement on the 2 core variables in it. The current > code does not show the 3 way state machine as well as that could, and > for all I know there is something intelligent we could do with the 4th > state. > > On Sat, Apr 11, 2015 at 11:44 AM, Dave Taht wrote: >> Stuff on my backlog of researchy stuff. >> >> 1) cake_drop_monitor - I wanted a way to throw drop AND mark >> notifications up to userspace, >> including the packet=C2=B4s time of entry and the time of drop, as well = as >> the IP headers >> and next hop destination macaddr. >> >> There are many use cases for this: >> >> A) - testing the functionality of the algorithm and being able to >> collect and analyze drops as they happen. >> >> NET_DROP_MONITOR did not cut it but I have not looked at it in a year. >> It drives me crazy to be dropping packets all over the system and to >> not be able to track down where they happened. >> >> This is the primary reason why I had switched back to 64 bit timestamps,= btw. >> >> B) Having the drop notifications might be useful in tuning or steering >> traffic to different routes. >> >> C) It is way easier to do a graph of the drop pattern with this info >> thrown to userspace. >> >> 2) Dearly wanted to actually be doing the timestamping AND hashing in >> the native skb >> struct on entry to the system itself, not the qdisc. Measuring the >> latency from ingress from the >> wire to egress would result in much better cpu overload behavior. I am >> totally aware of >> how much mainline linux would not take this option, but things have >> evolved over there, so >> leveraging the rxhash and skb->timestamp fields seems a possibility... >> >> I think this would let us get along better with netem also, but would >> have to go look again. >> >> Call that cake-rxhash. :) >> >> 3) In my benchmark of the latest cake3, ecn traffic was not as good as >> expected, but that might have been an anomoly of the test. Need to >> test ecn thoroughly this time, almost in preference to looking at drop >> behavior. Toke probably has ecn off by default right now. On, after >> this test run? >> >> 4) Testing higher rates and looking at cwnd for codel is important. >> The dropoff toke noted in his paper is real. Also there is possibly >> some ideal ratio between number of flows and bandwidth that makes more >> sense than a fixed number of flows. Also I keep harping on the darn >> resumption algo... but need to test with lousier tcps like windows. >> >> 5) Byte Mode-ish handling >> >> Dropping a single 64 byte packet does little good. You will find in >> the 50 flow tests that a ton of traffic is acks, not being dropped, >> and pie does better in this case than does fq, as it shoots >> wildly at everything, but usually misses the fat packets, where DRR >> will merrily store up an entire >> MTU worth of useless acks when only one is needed. >> >> So just trying to drop more little packets might be helpful in some case= s. >> >> 6) Ack thinning. I gave what is conventionally called "stretch acks" a >> new name, as stretch acks >> have a deserved reputation as sucking. Well, they dont suck anymore in >> linux, and what I was >> mostly thinking was to drop no more than 2 in a row... >> >> One thing this would help with is in packing wifi aggregates - which >> have hard limits on the number of packets in a TXOP (42), and a byte >> limit on wireless n of 64k. Sending 41 acks from >> one flow, when you could send the last 2, seems like a big win on >> packing a TXOP. >> >> (this is something eric proposed, and given the drop rates we now see >> from wifi and the wild and wooly internet I am inclined to agree that >> it is worth fiddling with) >> >> (I am not huge on it, though) >> >> 7) Macaddr hashing on the nexthop instead of the 5tuple. When used on >> an internal, switched network, it would be better to try and maximize >> the port usage rather than the 5 tuple in some cases. >> >> I have never got around to writing a mac hash I liked, my goal >> originally was to write one that found a minimal perfect hash solution >> eventually as mac addrs tend to be pretty stable on a network and >> rarely change. >> >> Warning: minimal perfect hash attempts are a wet paint thing! I really >> want a FPGA solver for them.... dont go play with the code out there, >> you will lose days to it... you have been warned. >> >> http://cmph.sourceforge.net/concepts.html >> >> I would like there to be a generic mac hashing thing in tc, actually. >> >> 8) Parallel FIB lookup >> >> IF you assume that you have tons of queues routing packets from >> ingress to egress, on tons of cpus, you can actually do the FIB lookup >> in parallel also. There is some old stuff on virtualqueue >> and virtual clock fqing which makes for tighter >> >> 9) Need a codel *library* that works at the mac80211 layer. I think >> codel*.h sufficies but am not sure. And for that matter, codel itself >> seems like it would need a calculated target and a few other thing to >> work right on wifi. >> >> As for the hashing... >> >> Personally I do not think that the 8 way set associative has is what >> wifi needs for cake, I tend to think we need to "pack" aggregates with >> as many different flows as possible, and randomize how we packet >> them... I think.... maybe.... >> >> 10) I really dont like BQL with multi-queued hardware queues. More >> backpressure is needed in that case than we get. >> >> 11) GRO peeling >> >> Offloads suck >> >> -- >> Dave T=C3=A4ht >> Let's make wifi fast, less jittery and reliable again! >> >> https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb > > > > -- > Dave T=C3=A4ht > Let's make wifi fast, less jittery and reliable again! > > https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb --=20 Dave T=C3=A4ht Let's make wifi fast, less jittery and reliable again! https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb