* [Bloat] Computer generated congestion control @ 2015-04-03 6:42 Simon Barber 2015-04-03 7:45 ` Jonathan Morton 0 siblings, 1 reply; 8+ messages in thread From: Simon Barber @ 2015-04-03 6:42 UTC (permalink / raw) To: bloat I ran across this project - looks interesting. http://web.mit.edu/remy/ Simon ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bloat] Computer generated congestion control 2015-04-03 6:42 [Bloat] Computer generated congestion control Simon Barber @ 2015-04-03 7:45 ` Jonathan Morton 2015-04-03 8:52 ` David Lang 0 siblings, 1 reply; 8+ messages in thread From: Jonathan Morton @ 2015-04-03 7:45 UTC (permalink / raw) To: Simon Barber; +Cc: bloat [-- Attachment #1: Type: text/plain, Size: 1455 bytes --] I think we've seen that before. The headline results are certainly impressive. But there's a big caveat, which is in fact revealed by the authors. Remy works by tailoring the congestion control algorithm to the network characteristics that it's been told about. If the actual network it's running on matches those characteristics, then the results are good. The more specific that information is, the better the results. But if the network characteristics differ from the information given, the results are bad - and the more specific the data was, the more likely a mismatch will occur. If we simply knew, a priori, what the delay bandwidth product was for a given connection, we wouldn't need congestion control algorithms in the first place, as we could simply maintain the congestion window at that value. That need for a priori knowledge is a fundamental problem with Remy's approach. So while existing, hand written congestion control algorithms aren't perfect, in practice they tend to do a competent job in difficult circumstances, using the limited information available to them. If anything, I'd like them to put some sane upper bound on the RTT - one compatible with satellite links, but which would avoid flooding unmanaged buffers to multi-minute delays. But when they get an unambiguous congestion signal, they respond, and so they adapt to the myriad varieties of link characteristics actually found on real networks. - Jonathan Morton [-- Attachment #2: Type: text/html, Size: 1580 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bloat] Computer generated congestion control 2015-04-03 7:45 ` Jonathan Morton @ 2015-04-03 8:52 ` David Lang 2015-04-03 9:28 ` Jonathan Morton 0 siblings, 1 reply; 8+ messages in thread From: David Lang @ 2015-04-03 8:52 UTC (permalink / raw) To: Jonathan Morton; +Cc: bloat [-- Attachment #1: Type: TEXT/Plain, Size: 2115 bytes --] On Fri, 3 Apr 2015, Jonathan Morton wrote: > I'd like them to put some sane upper bound on the RTT - one > compatible with satellite links, but which would avoid flooding unmanaged > buffers to multi-minute delays. The problem is that there aren't any numbers that meet these two criteria. Even if you ignore 10G and faster interfaces, a 1Gb/s interface withsatellite sized latencies is a LOT of data, far more than is needed to flood a 'normal' link Even if you "assume" that a satellite link is never going to be faster than say 100Mb/s, with 1s of RTT you have more than enough data to drive a small link into delays long enough to start triggering retransmission of packets. and that's assuming that you have your queues defined by byte count, not packet count (which adds another large multiplier based on the potential need to hold enough tiny packets to keep the link busy) Then when you add in the fact that the download side is being generated by a server that legitimately coudl have a 10Gb/sec or faster pipe, and while it may not need to talk over a satellite at those speeds, even a terrestrial pipe around the world has a high enough latency to require a bandwidth-latency product to rivel or exceed the worst-case consumer satellite situation... In addition, even if the buffers are byte-based and tuned for exactly the pipe size, you still have the firness problem. Sparse/short flows tend to be things that are much mroe sensitive to latency (DNS, HTMl pages that then trigger the loading of many resoruces, etc) so you really do not want to have them waiting behind bulk data flows. Since you can't trust any QoS markings set by someone else, it's not possible to statically configure things to 'just work'. The good news is that we now have a few different ways of activly managng the queues that work well, so we can move from figuring otu what to do to trying to convince peopel to do it. If it really was as simple as there beign a reasonable cap, and beng able to get unabiguous congestion signals reliably, the problesm would have been solved years ago David Lang [-- Attachment #2: Type: TEXT/PLAIN, Size: 140 bytes --] _______________________________________________ Bloat mailing list Bloat@lists.bufferbloat.net https://lists.bufferbloat.net/listinfo/bloat ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bloat] Computer generated congestion control 2015-04-03 8:52 ` David Lang @ 2015-04-03 9:28 ` Jonathan Morton 2015-04-03 9:44 ` David Lang 0 siblings, 1 reply; 8+ messages in thread From: Jonathan Morton @ 2015-04-03 9:28 UTC (permalink / raw) To: David Lang; +Cc: bloat [-- Attachment #1: Type: text/plain, Size: 1130 bytes --] > > I'd like them to put some sane upper bound on the RTT - one compatible with satellite links, but which would avoid flooding unmanaged buffers to multi-minute delays. > The problem is that there aren't any numbers that meet these two criteria. > Even if you ignore 10G and faster interfaces, a 1Gb/s interface withsatellite sized latencies is a LOT of data, far more than is needed to flood a 'normal' link I very deliberately said "RTT", not "BDP". TCP stacks already track an estimate of RTT for various reasons, so in principle they could stop increasing the congestion window when that RTT reaches some critical value (1 second, say). The fact that they do not already do so is evidenced by the observations of multi-minute induced delays in certain circumstances. And this is not a complete solution by any means. Vegas proved that an altruistic limit on RTT by an endpoint, with no other measures within the network, leads to poor fairness between flows. But if the major OSes did that, more networks would be able to survive overload conditions while providing some usable service to their users. - Jonathan Morton [-- Attachment #2: Type: text/html, Size: 1277 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bloat] Computer generated congestion control 2015-04-03 9:28 ` Jonathan Morton @ 2015-04-03 9:44 ` David Lang 2015-04-03 11:06 ` Jonathan Morton 0 siblings, 1 reply; 8+ messages in thread From: David Lang @ 2015-04-03 9:44 UTC (permalink / raw) To: Jonathan Morton; +Cc: bloat On Fri, 3 Apr 2015, Jonathan Morton wrote: >>> I'd like them to put some sane upper bound on the RTT - one compatible > with satellite links, but which would avoid flooding unmanaged buffers to > multi-minute delays. > >> The problem is that there aren't any numbers that meet these two criteria. >> Even if you ignore 10G and faster interfaces, a 1Gb/s interface > withsatellite sized latencies is a LOT of data, far more than is needed to > flood a 'normal' link > > I very deliberately said "RTT", not "BDP". TCP stacks already track an > estimate of RTT for various reasons, so in principle they could stop > increasing the congestion window when that RTT reaches some critical value > (1 second, say). The fact that they do not already do so is evidenced by > the observations of multi-minute induced delays in certain circumstances. I think the huge delays aren't because the RTT estimates are that long, but rather that early on the availble bandwidth estimates were wildly high because there was no feedback happening to indicate otherwise (the buffers were hiding it all) once you get into the collapse mode of operation where you are sending multiple packets for every one that gets through, it's _really_ hard to recover short of just stopping for a while to let the junk clear. If it was gradual degredation all the way down, then backing off a little bit would show clear improvement and feedback loops would clear thigns up fairly quickly. But when there is a cliff in the performance curve, and you go way beyond the cliff before you notice it (think Wile E. Coyote missing a turn in the road), you can't just step back to recover. When a whole group of people do the same thing, the total backoff that needs to happen for the network to recover is frequenly significantly more than any one system's contribution to the problem. They all need to back off a lot. > And this is not a complete solution by any means. Vegas proved that an > altruistic limit on RTT by an endpoint, with no other measures within the > network, leads to poor fairness between flows. But if the major OSes did > that, more networks would be able to survive overload conditions while > providing some usable service to their users. But we don't need to take such a risk, we have active queue management algorithms that we know will work if they are deployed on the chokepoint machines (for everything except wifi hops right now) best of all, these don't require any knowlege or guesswork about the overall network and no knowlege of the RTT or bandwidth-latency product. All they need is information about the data flows going through the device and when the local link can accept mroe data. making decisions based on local data scales really well. making estimates of the state of the network overall, not so much. David Lang ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bloat] Computer generated congestion control 2015-04-03 9:44 ` David Lang @ 2015-04-03 11:06 ` Jonathan Morton 2015-04-03 12:03 ` Dave Taht 0 siblings, 1 reply; 8+ messages in thread From: Jonathan Morton @ 2015-04-03 11:06 UTC (permalink / raw) To: David Lang; +Cc: bloat [-- Attachment #1: Type: text/plain, Size: 580 bytes --] David, you're preaching to the choir. Perhaps you're unaware of my recent work on cake, which is very much in the smart local decisions camp. There's some discussion on the Codel list. As I said, I'm perfectly aware that, with a dumb network, there isn't enough information at the endpoints to do the congestion control job properly. I was simply taking the discussion about Remy (which claims to do exactly that) to suggest that a slightly better job could, in theory, be done, and that the bufferbloat problem might not have been quite so severe if it had. - Jonathan Morton [-- Attachment #2: Type: text/html, Size: 671 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bloat] Computer generated congestion control 2015-04-03 11:06 ` Jonathan Morton @ 2015-04-03 12:03 ` Dave Taht 2015-04-04 23:33 ` Juliusz Chroboczek 0 siblings, 1 reply; 8+ messages in thread From: Dave Taht @ 2015-04-03 12:03 UTC (permalink / raw) To: Jonathan Morton; +Cc: Keith Winstein, bloat Several items about remy: 1) The core flaw in the work was that they targeted really long RTTs (>100ms) where here we are working in a range of RTTs, mostly shorter. I would have been much happier had the work continued (has it?) to look at solving for all RTTs passing through the network. That said, solving for long RTTs is a hard problem with real world use cases. 2) There is this persistent opinion in academia, notably in the e2e folk, that > 100ms of delay is "ok". We *never* defined a limit to bufferbloat in the original definition of the word - because we did not know what a good limit was! Several otherwise good papers then go forth to define "bufferbloat" as RTTs greater than 200ms, and then through statistical legerdemain show that it doesn´t exist (my favorite was the one that discarded half the data just to start with, then chopped off everything above the 95th percentile). I referenced one of those papers in my rant at sigcomm: http://conferences.sigcomm.org/sigcomm/2014/doc/slides/137.pdf ... where we in the aqm world have settled on about 20-30ms as the outer bound for induced delay, and fq world, 5ms for sparse flows. I wish we could come up with another word, or better define bufferbloat than we have, to have real numbers in it. Closest we have come is: https://gettys.wordpress.com/2013/07/10/low-latency-requires-smart-queuing-traditional-aqm-is-not-enough/ 3) I for one welcome our new congestion algorithm producing computer overlords! The job is too hard for humans! I thought this work on remy was astoundingly promising and hope that work continues. In particular, *thinking* about congestion control as a meta problem was a breakthrough. If remy could produce results that achieved 5-25ms added latency e2e - or it got extended to managing the routers inbetween - I could quit this and go back to working on spacecraft. Many of the other products of this group of people are really amazing (mosh for one, the mahimahi and delayshell stuff also https://github.com/ravinet/mahimahi ) If you are not using mosh for all your day to day interactive traffic, particularly on wifi you are annoying yourself for no good reason. But try the mosh-multipath work if you want to get on the bleeding edge... (note on mahimahi delayshell - there was a bug in that in that it assumes an infinite queue. I am not sure to what extent that was used in the remy work. There are patches for adding codel to it in a branch that I had discussed with keith a while back, I hope that got merged. (I meant to do it myself, forgot to take the day out I needed) I would like it if those producing test tools took a hard look at leveraging mahimahi in particular (I am looking at you... facebook) ) https://github.com/ravinet/mahimahi 4) Another MIT paper that I really liked was one that specified a FPGA in every router - not that that idea was cost feasible today - but it set me to thinking about what problems in this space could be better solved in gates, rather than code, what could be considered as massively parallel, and so on. I initially thought they were crazy... but after thinking about it a while I worked out a set of cool things that could be one day reduced to chips and would like to work on them.... That is partially why I have been backing this kickstarter project https://www.kickstarter.com/projects/onetswitch/onetswitch-open-source-hardware-for-networking as it gets the cost down to something a grad student could afford... but it seems likely that they won´t raise another 25k in the next 6 days to produce their first batch of boards. Note: They added a "get one give one" program at my request.... Anybody got a spare 25k to subsidize a whole bunch of really cool boards that cut the cost on a FPGA based home router re-design from 7k to 700 dollars? anyone??? I can think of a dozen+ people with the chops, if not the money, to work on FPGA based stuff. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Bloat] Computer generated congestion control 2015-04-03 12:03 ` Dave Taht @ 2015-04-04 23:33 ` Juliusz Chroboczek 0 siblings, 0 replies; 8+ messages in thread From: Juliusz Chroboczek @ 2015-04-04 23:33 UTC (permalink / raw) To: Dave Taht; +Cc: Jonathan Morton, bloat, Keith Winstein > 1) The core flaw in the work was that they targeted really long RTTs > (>100ms) where here we are working in a range of RTTs, mostly shorter. Yeah. They were doing experiments with LTE, which has pretty horrendous latency. (As in driving around Cambridge with an LTE modem and collecting packet traces.) (Dave -- they were doing *experiments*. Networking academics doing actual experiments, imagine that.) > I would have been much happier had the work continued (has it?) Keith got a position (no suprise there, he's very good), so he's very busy with teaching and setting up his research team. But I see he's published a related paper at SIGCOMM last year: Anirudh Sivaraman, Keith Winstein, Pratiksha Thaker, Hari Balakrishnan. An experimental study of the learnability of congestion control. SIGCOMM 2014:479-490 http://dspace.mit.edu/openaccess-disseminate/1721.1/88914 -- Juliusz ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2015-04-04 23:33 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-04-03 6:42 [Bloat] Computer generated congestion control Simon Barber 2015-04-03 7:45 ` Jonathan Morton 2015-04-03 8:52 ` David Lang 2015-04-03 9:28 ` Jonathan Morton 2015-04-03 9:44 ` David Lang 2015-04-03 11:06 ` Jonathan Morton 2015-04-03 12:03 ` Dave Taht 2015-04-04 23:33 ` Juliusz Chroboczek
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox