[Bloat] when does the CoDel part of fq_codel help in the real world?

Michael Welzl michawe at ifi.uio.no
Tue Nov 27 16:17:14 EST 2018


Hi,

A few answers below:


> On Nov 27, 2018, at 9:10 PM, Dave Taht <dave.taht at gmail.com> wrote:
> 
> On Mon, Nov 26, 2018 at 1:56 PM Michael Welzl <michawe at ifi.uio.no <mailto:michawe at ifi.uio.no>> wrote:
>> 
>> Hi folks,
>> 
>> That “Michael” dude was me  :)
>> 
>> About the stuff below, a few comments. First, an impressive effort to dig all of this up - I also thought that this was an interesting conversation to have!
>> 
>> However, I would like to point out that thesis defense conversations are meant to be provocative, by design - when I said that CoDel doesn’t usually help and long queues would be the right thing for all applications, I certainly didn’t REALLY REALLY mean that.  The idea was just to be thought provoking - and indeed I found this interesting: e.g., if you think about a short HTTP/1 connection, a large buffer just gives it a greater chance to get all packets across, and the perceived latency from the reduced round-trips after not dropping anything may in fact be less than with a smaller (or CoDel’ed) buffer.
> 
> I really did want Toke to have a hard time. Thanks for putting his
> back against the wall!
> 
> And I'd rather this be a discussion of toke's views... I do tend to
> think he thinks FQ solves more than it does.... and I wish we had a
> sound analysis as to why 1024 queues
> works so much better for us than 64 or less on the workloads we have.
> I tend to think in part it's because that acts as a 1000x1
> rate-shifter - but should it scale up? Or down? Is what we did with
> cake (1024 setassociative) useful? or excessive? I'm regularly seeing
> 64,000 queues on 10Gig and up hardware due to 64 hardware queues and
> fq_codel on each, on that sort of gear. I think that's too much and
> renders the aqm ineffective, but lack data...
> 
> but, to rant a bit...
> 
> While I tend to believe FQ solves 97% of the problem, AQM 2.9% and ECN .09%.

I think the sparse flow optimization bit plays a major role in FQ_CoDel.


> BUT: Amdahls law says once you reduce one part of the problem to 0,
> everything else takes 100%. :)
> 
> it often seems like me, being the sole and very lonely FQ advocate
> here in 2011, have reversed the situation (in this group!), and I'm
> oft the AQM advocate *here* now.

Well I’m with you, I do agree that an AQM is useful!  It’s just that there are not SO many cases where a single flow builds a standing queue only for itself and this really matters for that particular application.
But these cases absolutely do exist!  (and several examples were mentioned - also the VPN case etc.)


> It's sort of like all the people quoting the e2e argument still, back
> at me when dave reed (at least, and perhaps the other co-authors now)
> have bought into this level of network interference between the
> endpoints, and had no religion - or the red in a different light paper
> being rejected because it attempted to overturn other religion - and
> I'll be damned if I'll let fq_codel, sch_fq, pie, l4s, scream, nada,
> 
> I admit to getting kind of crusty and set in my ways, but so long as
> people put code in front of me along with the paper, I still think,
> when the facts change, so do my opinions.
> 
> Pacing is *really impressive* and I'd like to see that enter
> everything, not just in packet processing - I've been thinking hard
> about the impact of cpu bursts (like resizing a hash table), and other
> forms of work that we currently do on computers that have a
> "dragster-like" peak performance, and a great average, but horrible
> pathologies - and I think the world would be better off if we built
> more

+1


> Anyway...
> 
> Once you have FQ and a sound outer limit on buffer size (100ms),
> depredations like comcast's 680ms buffers no longer matter. There's
> still plenty of room to innovate. BBR works brilliantly vs fq_codel
> (and you can even turn ECN on which it doesn't respect and still get a
> great result). LoLa would probably work well also 'cept that the git
> tree was busted when I last tried it and it hasn't been tested much in
> the 1Mbit-1Gbit range.
> 
>> 
>> But corner cases aside, in fact I very much agree with the answers to my question Pete gives below, and also with the points others have made in answering this thread. Jonathan Morton even mentioned ECN - after Dave’s recent over-reaction to ECN I made a point of not bringing up ECN *yet* again
> 
> Not going to go into it (much) today! We ended up starting another
> project on ECN that that operates under my core ground rule - "show me
> the code" - and life over there and on that mailing list has been
> pleasantly quiet. https://www.bufferbloat.net/projects/ecn-sane/wiki/ <https://www.bufferbloat.net/projects/ecn-sane/wiki/>
> .
> 
> I did get back on the tsvwg mailing list recently because of some
> ludicrously inaccurate misstatements about fq_codel. I also made a
> strong appeal to the l4s people, to, in general, "stop thanking me" in
> their documents. To me that reads as an endorsement, where all I did
> was participate in the process until I gave up and hit my "show me the
> code" moment - which was about 5 years ago and hasn't moved on the
> needle since except in mutating standards documents.
> 
> The other document I didn't like was an arbitary attempt to just set
> the ecn backoff figure to .8 when the sanest thing, given the
> deployment, and pacing... was to aim for a right number - anyway.....
> in that case I just wanted off the "thank you" list.

So let’s draw a line between L4S and “the other document you didn’t like”, which was our ABE.
L4S is a more drastic attempt at getting things right. I haven’t been contributing to this much; I like it for what it’s trying to achieve, but I don’t have a strong opinion on it.
Myself, I thought that much smaller changes might have a better chance at getting the incentives right, to support ECN deployment - which was the change to 0.8.

Looking at our own document again, I am surprised to see that you are indeed in our acknowledgement list:
https://tools.ietf.org/html/draft-ietf-tcpm-alternativebackoff-ecn-12 <https://tools.ietf.org/html/draft-ietf-tcpm-alternativebackoff-ecn-12>
We added everyone who we thought made useful suggestions - it wasn’t meant as a sign of endorsement. But, before RFC publication, there is still an opportunity to remove your name.
=> I apologize and will remove you.


> I like to think the more or less rfc3168 compliant deployment of ecn
> is thus far going brilliantly, but lack data. Certainly would like a
> hostile reviewers evaluation of cake's ecn method and for that matter,
> pie's, honestly - from real traffic! There's an RFC- compliant version
> of Pie being pushed into the kernel after it gets through some of
> stephens nits.
> 
> And I'd really prefer all future discussions of "ecn benefits" to come
> with code and data and be discussed over on the ecn-sane mailing list,
> or *not discussed here* if no code is available.

You keep complaining about lack of code. At least for ABE:
- I think the code is in FreeBSD now
- There is a slightly older Linux patch. I agree it would be nice to continue with this code… I don’t have someone doing this right now.
Anyway, all code, along with measurement results, is available from:
http://heim.ifi.uio.no/michawe/research/projects/abe/index.html <http://heim.ifi.uio.no/michawe/research/projects/abe/index.html>


>> , but… yes indeed, being able to use ECN to tell an application to back off instead of requiring to drop a packet is also one of the benefits.
> 
> One thus far mis-understood and under-analyzed aspect of our work is
> the switch to head dropping.
> 
> To me the switch to head dropping essentially killed the tail loss RTO
> problem, eliminated most of the need for ecn.

I doubt that: TCP will need to retransmit that packet at the head, and that takes an RTT - all the packets after it will need to wait in the receiver buffer before the application gets them.
But I don’t have measurements to prove my point, so I’m just hand-waving...


> Forward progress and
> prompt signalling always happens. That otherwise wonderful piece
> stuart cheshire did at apple elided the actual dropping mode version
> of fq_codel, which as best as I recall was about 12? 15ms? long and
> totally invisible to the application.
> 
>> (I think people easily miss the latency benefit of not dropping a packet, and thereby eliminating head-of-line blocking - packet drops require an extra RTT for retransmission, which can be quite a long time. This is about measuring latency at the right layer...)
> 
> see above. And yea, perversely, I agree with your last statement.

Perversely? Come on  :)


> A
> slashdot web page download takes 78 separate flows and 2.2 seconds to
> complete. Worst case completion
> time - if you had *tail* loss would be about 80ms longer than that, on
> a tiny fraction of loads. The rest of it is absorbed into those 2.2
> seconds.

Yes - and these separate flows get their own buckets in FQ_CoDel. Which is great - just not much effect from CoDel there.
But I’m NOT arguing that per-flow AQM is a bad thing, absolutely not!


> EVEN with http 2.0/ I would be extremely surprised to learn that many
> websites fit it all into one tcp transaction.
> 
> There are very few other examples of TCP traffic requiring a low
> latency response. I happen to be very happy with the ecn support in
> mosh btw, not that anybody's ever looked at it since we did it.
> 
> And I'd really prefer all future discussions of "ecn benefits" to come
> with code and data and be discussed over on the ecn-sane mailing list,
> or not discussed here if no code is available.
> 
>> BTW, Anna Brunstrom was also very quick to also give me the HTTP/2.0 example in the break after the defense. Also, TCP will generally not work very well when queues get very long… the RTT estimate gets way off.
> 
> I like to think that the syn/ack and ssl negotation handshake under
> fq_codel gives a much more accurate estimate of actual RTT than we
> ever had before.

Another good point - this is indeed useful!


>> All in all, I think this is a fun thought to consider for a bit, but not really something worth spending people’s time on, IMO: big buffers are bad, period. All else are corner cases.
> 
> I've said it elsewhere, and perhaps we should resume, but an RFC
> merely stating the obvious about maximal buffer limits and getting
> ISPs do to do that would be a boon.
> 
>> I’ll use the opportunity to tell folks that I was also pretty impressed with Toke’s thesis as well as his performance at the defense. Among the many cool things he’s developed (or contributed to), my personal favorite is the airtime fairness scheduler. But, there were many more. Really good stuff.
> 
> I so wish the world has about 1000 more toke's in training. How can we
> make that happen?

I don’t know… in academia, the mix of really contributing to the kernel on the one side, and getting academic results on the other, is a rare thing.
Not that we advisors (at least the people I consider friends) would be against that!  But it's not easy to find someone who can pull this off.

Cheers,
Michael

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.bufferbloat.net/pipermail/bloat/attachments/20181127/2513834a/attachment-0001.html>


More information about the Bloat mailing list