From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk0-x233.google.com (mail-qk0-x233.google.com [IPv6:2607:f8b0:400d:c09::233]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id A049D21F403 for ; Sat, 11 Apr 2015 19:34:01 -0700 (PDT) Received: by qkx62 with SMTP id 62so106491195qkx.0 for ; Sat, 11 Apr 2015 19:34:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=H6SdndRHKBbnjlXQAvVblIrAZl1ulkOC7Rd0oYVbZQA=; b=bZ8CLzyUaIbWnqZd7EmPhFzXkVFKC8sYQqaOftvQLzNJrRMWAFFdQpAHyVdkgnDVBW 8H832qp1R+7+rw1/p2KvYkH3dN7NLF6Ubv34lB6aFjCvUSaqa3Q+OHIyPvyWzF2dBy6Z t3U678FhJ7qAzrgydhVfQrQpReaKa3xuLD0HoIw+Y1zHYiurtjMaFjjUWq1h6BLbCJFF 8VtS8OKL9Tr3k+Oc9hEd4m/wd1KY+IjVxjCfTPr2ZCoWSLAr5RsEX3ZvDQZOMxCNHIua SSkX4iH1x4Ut8wwPRFytVn0SV4XU0Hhmm8MsSk2Z+fEsxxJ19s8iVBZek0xJqIrmjx2e VDtA== MIME-Version: 1.0 X-Received: by 10.182.243.229 with SMTP id xb5mr7686263obc.63.1428806039975; Sat, 11 Apr 2015 19:33:59 -0700 (PDT) Received: by 10.202.51.66 with HTTP; Sat, 11 Apr 2015 19:33:59 -0700 (PDT) In-Reply-To: References: Date: Sat, 11 Apr 2015 19:33:59 -0700 Message-ID: From: Dave Taht To: cake@lists.bufferbloat.net Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Cake] cake exploration X-BeenThere: cake@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Cake - FQ_codel the next generation List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 12 Apr 2015 02:34:30 -0000 17) the atm compensation in cake is entirely untested. And it is unclear as to how best handle pppoe. On Sat, Apr 11, 2015 at 12:12 PM, Dave Taht wrote: > 16) Better VPN handling > > 5 flows of encapsulated vpn traffic competes badly with 5 flows of > (say) bittorrent traffic. > > We could give the ipsec form of vpn traffic a boost by explicitly > recognising and classifying the AH and ESP headers into a different > diffserv bin. > > Similarly, tinc and openvpn, could use a port match. > > > > On Sat, Apr 11, 2015 at 11:48 AM, Dave Taht wrote: >> 15) Needs to work so an ISP can create service classes for their custome= rs >> >> DRR 1: cake bandwidth X >> DRR 2: cake bandwidth Y >> DRR 3: >> >> I have no idea whether this can work at all, last I tried it DRR would >> stall everytime fq_codel had no packets to deliver. >> >> A related issue is that there needs to be a way to have a tc or >> iptables filter to map multiple IPs and addresses so that they return >> a single distinct integer for placement into such queue systems >> so that a lookup of someone that has x.y.x.z, q.f.b.a/24, j:k:h/64 and >> l:m:n/48 can return a single integer representing the customer so it >> can be fed into the above correct sub-queuedisc. >> >> I think, but that is not sure, that is all my backlog! >> >> On Sat, Apr 11, 2015 at 11:47 AM, Dave Taht wrote: >>> 14) strict priority queues. Some CBR techniques, notably IPTV, want 0 >>> packet loss, but run at a rate determined by the provider to be below >>> what the subscriber will use. Sharing that "fairly" will lead to loss >>> of packets to those applications. >>> >>> I do not like strict priority queues. I would prefer, for example, >>> that the CBR application be marked with ECN, and ignored, vs the the >>> high probability someone will abuse a strict priority queue. >>> >>> >>> >>> On Sat, Apr 11, 2015 at 11:45 AM, Dave Taht wrote= : >>>> 12) Better starting interval and target for codel=C2=B4s maintence var= s in >>>> relationship to existing flows >>>> >>>> Right now sch_fq, sch_pie give priority to flows in their first IW >>>> phases. This makes them vulnerable to DDOS attacks with tons of new >>>> flows. >>>> >>>> sch_fq_codel mitigates this somewhat by starting to hash flows into >>>> the same buckets. >>>> >>>> sch_cake=C2=B4s more perfect hashing gives IW more of a boost. >>>> >>>> A thought was to do a combined ewma of all active flows and to hand >>>> their current codel settings to new flows as they arrive, with less of >>>> a boost. >>>> >>>> This MIGHT work better when you have short RTTs generally on local >>>> networks. Other thoughts appreciated. >>>> >>>> There is another related problem in the resumption portion of the >>>> algorithm as the decay of the existing state variables is arbitrary >>>> and way too long in some cases. I think I had solved this by coming up >>>> with an estimate for the amount of decay needed other than count - 2, >>>> doing a calculation from the last time a flow had packets to the next, >>>> but can=C2=B4t remember how I did it! It is easy if you have a last ti= me >>>> per queue and use a normal sqrt with a divide... but my brain crashes >>>> at the reciprocal cache math we have instead.... >>>> >>>> I am not allergic to a divide. I am not allergic to using a shift for >>>> the target and calculating the interval only relative to bandwidth, as >>>> mentioned elsewhere. At 64k worth of bandwidth we just end up with a >>>> huge interval, no big deal. But plan to ride along with the two >>>> separately for now. >>>> >>>> 13) It might be possible to write a faster codel - and easier to read >>>> by using a case statement on the 2 core variables in it. The current >>>> code does not show the 3 way state machine as well as that could, and >>>> for all I know there is something intelligent we could do with the 4th >>>> state. >>>> >>>> On Sat, Apr 11, 2015 at 11:44 AM, Dave Taht wrot= e: >>>>> Stuff on my backlog of researchy stuff. >>>>> >>>>> 1) cake_drop_monitor - I wanted a way to throw drop AND mark >>>>> notifications up to userspace, >>>>> including the packet=C2=B4s time of entry and the time of drop, as we= ll as >>>>> the IP headers >>>>> and next hop destination macaddr. >>>>> >>>>> There are many use cases for this: >>>>> >>>>> A) - testing the functionality of the algorithm and being able to >>>>> collect and analyze drops as they happen. >>>>> >>>>> NET_DROP_MONITOR did not cut it but I have not looked at it in a year= . >>>>> It drives me crazy to be dropping packets all over the system and to >>>>> not be able to track down where they happened. >>>>> >>>>> This is the primary reason why I had switched back to 64 bit timestam= ps, btw. >>>>> >>>>> B) Having the drop notifications might be useful in tuning or steerin= g >>>>> traffic to different routes. >>>>> >>>>> C) It is way easier to do a graph of the drop pattern with this info >>>>> thrown to userspace. >>>>> >>>>> 2) Dearly wanted to actually be doing the timestamping AND hashing in >>>>> the native skb >>>>> struct on entry to the system itself, not the qdisc. Measuring the >>>>> latency from ingress from the >>>>> wire to egress would result in much better cpu overload behavior. I a= m >>>>> totally aware of >>>>> how much mainline linux would not take this option, but things have >>>>> evolved over there, so >>>>> leveraging the rxhash and skb->timestamp fields seems a possibility..= . >>>>> >>>>> I think this would let us get along better with netem also, but would >>>>> have to go look again. >>>>> >>>>> Call that cake-rxhash. :) >>>>> >>>>> 3) In my benchmark of the latest cake3, ecn traffic was not as good a= s >>>>> expected, but that might have been an anomoly of the test. Need to >>>>> test ecn thoroughly this time, almost in preference to looking at dro= p >>>>> behavior. Toke probably has ecn off by default right now. On, after >>>>> this test run? >>>>> >>>>> 4) Testing higher rates and looking at cwnd for codel is important. >>>>> The dropoff toke noted in his paper is real. Also there is possibly >>>>> some ideal ratio between number of flows and bandwidth that makes mor= e >>>>> sense than a fixed number of flows. Also I keep harping on the darn >>>>> resumption algo... but need to test with lousier tcps like windows. >>>>> >>>>> 5) Byte Mode-ish handling >>>>> >>>>> Dropping a single 64 byte packet does little good. You will find in >>>>> the 50 flow tests that a ton of traffic is acks, not being dropped, >>>>> and pie does better in this case than does fq, as it shoots >>>>> wildly at everything, but usually misses the fat packets, where DRR >>>>> will merrily store up an entire >>>>> MTU worth of useless acks when only one is needed. >>>>> >>>>> So just trying to drop more little packets might be helpful in some c= ases. >>>>> >>>>> 6) Ack thinning. I gave what is conventionally called "stretch acks" = a >>>>> new name, as stretch acks >>>>> have a deserved reputation as sucking. Well, they dont suck anymore i= n >>>>> linux, and what I was >>>>> mostly thinking was to drop no more than 2 in a row... >>>>> >>>>> One thing this would help with is in packing wifi aggregates - which >>>>> have hard limits on the number of packets in a TXOP (42), and a byte >>>>> limit on wireless n of 64k. Sending 41 acks from >>>>> one flow, when you could send the last 2, seems like a big win on >>>>> packing a TXOP. >>>>> >>>>> (this is something eric proposed, and given the drop rates we now see >>>>> from wifi and the wild and wooly internet I am inclined to agree that >>>>> it is worth fiddling with) >>>>> >>>>> (I am not huge on it, though) >>>>> >>>>> 7) Macaddr hashing on the nexthop instead of the 5tuple. When used on >>>>> an internal, switched network, it would be better to try and maximiz= e >>>>> the port usage rather than the 5 tuple in some cases. >>>>> >>>>> I have never got around to writing a mac hash I liked, my goal >>>>> originally was to write one that found a minimal perfect hash solutio= n >>>>> eventually as mac addrs tend to be pretty stable on a network and >>>>> rarely change. >>>>> >>>>> Warning: minimal perfect hash attempts are a wet paint thing! I reall= y >>>>> want a FPGA solver for them.... dont go play with the code out there, >>>>> you will lose days to it... you have been warned. >>>>> >>>>> http://cmph.sourceforge.net/concepts.html >>>>> >>>>> I would like there to be a generic mac hashing thing in tc, actually. >>>>> >>>>> 8) Parallel FIB lookup >>>>> >>>>> IF you assume that you have tons of queues routing packets from >>>>> ingress to egress, on tons of cpus, you can actually do the FIB looku= p >>>>> in parallel also. There is some old stuff on virtualqueue >>>>> and virtual clock fqing which makes for tighter >>>>> >>>>> 9) Need a codel *library* that works at the mac80211 layer. I think >>>>> codel*.h sufficies but am not sure. And for that matter, codel itself >>>>> seems like it would need a calculated target and a few other thing to >>>>> work right on wifi. >>>>> >>>>> As for the hashing... >>>>> >>>>> Personally I do not think that the 8 way set associative has is what >>>>> wifi needs for cake, I tend to think we need to "pack" aggregates wit= h >>>>> as many different flows as possible, and randomize how we packet >>>>> them... I think.... maybe.... >>>>> >>>>> 10) I really dont like BQL with multi-queued hardware queues. More >>>>> backpressure is needed in that case than we get. >>>>> >>>>> 11) GRO peeling >>>>> >>>>> Offloads suck >>>>> >>>>> -- >>>>> Dave T=C3=A4ht >>>>> Let's make wifi fast, less jittery and reliable again! >>>>> >>>>> https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb >>>> >>>> >>>> >>>> -- >>>> Dave T=C3=A4ht >>>> Let's make wifi fast, less jittery and reliable again! >>>> >>>> https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb >>> >>> >>> >>> -- >>> Dave T=C3=A4ht >>> Let's make wifi fast, less jittery and reliable again! >>> >>> https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb >> >> >> >> -- >> Dave T=C3=A4ht >> Let's make wifi fast, less jittery and reliable again! >> >> https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb > > > > -- > Dave T=C3=A4ht > Let's make wifi fast, less jittery and reliable again! > > https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb --=20 Dave T=C3=A4ht Let's make wifi fast, less jittery and reliable again! https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb