* [Bloat] Remy: Computer-Generated Congestion Control @ 2014-08-04 12:44 Baptiste Jonglez 2014-08-04 14:55 ` Dave Taht 0 siblings, 1 reply; 9+ messages in thread From: Baptiste Jonglez @ 2014-08-04 12:44 UTC (permalink / raw) To: bloat [-- Attachment #1: Type: text/plain, Size: 160 bytes --] It's not very new, but I don't remember seeing this discussed here: http://web.mit.edu/remy/ It mentions Bufferbloat and CoDel (look at the FAQ). Baptiste [-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bloat] Remy: Computer-Generated Congestion Control 2014-08-04 12:44 [Bloat] Remy: Computer-Generated Congestion Control Baptiste Jonglez @ 2014-08-04 14:55 ` Dave Taht 2014-08-21 10:21 ` Michael Welzl 0 siblings, 1 reply; 9+ messages in thread From: Dave Taht @ 2014-08-04 14:55 UTC (permalink / raw) To: Baptiste Jonglez; +Cc: bloat [-- Attachment #1: Type: text/plain, Size: 4256 bytes --] On Mon, Aug 4, 2014 at 8:44 AM, Baptiste Jonglez <baptiste.jonglez@ens-lyon.fr> wrote: > It's not very new, but I don't remember seeing this discussed here: > > http://web.mit.edu/remy/ I don't remember if it was discussed here or not. It certainly was a hit at ICCRG ietf before last. > It mentions Bufferbloat and CoDel (look at the FAQ). Yes they did some of their experiments against the ns2 model of sfq-codel to use as a reference for the best of the then-available aqm and FQ technologies. They also compared XCP as an example of a high performing TCP alternative. They have a new paper out for the upcoming sigcomm: http://web.mit.edu/keithw/www/Learnability-SIGCOMM2014.pdf And a bit of discussion here: https://news.ycombinator.com/item?id=8129115 I do not doubt that one day computer assisted methods such as remy will one day supplant or replace humans in many aspects of the CS field and hardware designs. I well remember the state of the mp3 research back in the 80s when it took a day to crunch down a rock song into something replete with dolphin warbles, and how that eventually turned out. I adore the remy work, the presentations, and the graphs (toke adopted the inverted latency/bandwidth ellipse plots in netperf-wrapper, in fact), and the tools like the delayshell and cellshell developed so far are valuable and interesting, and there's at least one other tool that I forget the name of that was also way cool, it's great that they supply so much source code and so on. Also I love the omniscent concept, which gives all of us something to aim for - 0 latency, and 100% throughput.. The work seethes with long term potential. But. My biggest problem with all the work so far is that it starts with a constant baseline 150ms or 100ms RTT, and then try various algorithms over a wide range of bandwidths, and then elides that base RTT in all future plots. Unless you read carefully, you don't see that... The original bufferbloat experiment was at a physical RTT of 10ms, with bidirectional traffic flooding the queues in both directions, on an asymmetric network. I keep waiting for more experimenters to try that as a baseline experiment. Nearly all the work has been focused on reducing the induced queuing delay on RTTs that are basically bounded by the speed of light, cpu scheduling delay, and the mac acquisition time. They don't start at an arbitrary value of 100+ms! The fattest flows are usually located in nearly the same datacenter in the real world, and it's well known that short RTT flows can starve longer RTT flows. My selection of physical RTTs to experiment with are thus <100us (ethernet), 4ms (google fiber), 10ms, 18ms (FIOS), 38ms (cable)[1], and against a variety of bandwidths ranging from gigE down to 384kbit - and the hope is generally to cut the induced queuing delay down to a physical RTT + a tiny bit. In the real world we have a potential range of RTTs starting at nearly 0ms, and going up to potentially 100s of ms, a many order of magnitude difference in RTT compared to the remy work, which treats it as a nearly fixed variable. And: I (and many others) have spent a great deal of time trying to shave an additional fraction of ms off the results we get so far, at these real world RTTs and bandwidths! I do expect good things once Remy is turned loose, all 80 cpus, for 5 days - on these real world problems, but results as spectacular as they achieve, in simulation, at their really long RTTs, probably won't be as impressively "better" vs what we already achieve with running, increasingly deployed code, in real world physical, and oft varying RTTs. > Baptiste > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat > [1] Much of the remy work is looking at trying to "fix" the LTE problem with e2e CC. I don't know why measured RTTs are so poor in that world, the speed of light is pretty constant everywhere. Certainly wifi has <2ms inherent physical RTT.... -- Dave Täht NSFW: https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0296_indecent.article [-- Attachment #2: 100vs10mbit_4msrtt.png --] [-- Type: image/png, Size: 65548 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bloat] Remy: Computer-Generated Congestion Control 2014-08-04 14:55 ` Dave Taht @ 2014-08-21 10:21 ` Michael Welzl 2014-08-21 17:04 ` Dave Taht 0 siblings, 1 reply; 9+ messages in thread From: Michael Welzl @ 2014-08-21 10:21 UTC (permalink / raw) To: Dave Taht; +Cc: bloat Dave, About this point that I've seen you make repeatedly: > My biggest problem with all the work so far is that it starts with a > constant baseline 150ms or 100ms RTT, and then try various algorithms > over a wide range of bandwidths, and then elides that base RTT in all > future plots. Unless you read carefully, you don't see that... > > The original bufferbloat experiment was at a physical RTT of 10ms, > with bidirectional traffic flooding the queues in both directions, on > an asymmetric network. I keep waiting for more experimenters to try > that as a baseline experiment. This sounds like a call for reality, when the point is to measure things that matter to the system you're investigating. E.g., if I'm investigating TCP, and I don't specifically work on ACK behavior (ACK congestion control or something like that), I usually don't care much about the backwards traffic or the asymmetry. Yes it does influence the measured RTT a bit, but then you argue for using a smaller base RTT where the impact of this gets smaller too. What matters to the function of congestion controllers is the BDP, and the buffer size. As for real RTTs, I like pinging wwoz.org because it's a radio station in New Orleans. I do sometimes listen to it, then I get traffic via a TCP connection that has this RTT. Very real. My ping delay is consistently above 110ms. On a side note, I can't help but mention that the "original bufferbloat experiment" features ping over FQ... measuring ping all by itself, pretty much :-) Michael ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bloat] Remy: Computer-Generated Congestion Control 2014-08-21 10:21 ` Michael Welzl @ 2014-08-21 17:04 ` Dave Taht 2014-08-21 17:33 ` Jonathan Morton 2014-08-21 19:46 ` Michael Welzl 0 siblings, 2 replies; 9+ messages in thread From: Dave Taht @ 2014-08-21 17:04 UTC (permalink / raw) To: Michael Welzl; +Cc: bloat On Thu, Aug 21, 2014 at 5:21 AM, Michael Welzl <michawe@ifi.uio.no> wrote: > Dave, > > About this point that I've seen you make repeatedly: > >> My biggest problem with all the work so far is that it starts with a >> constant baseline 150ms or 100ms RTT, and then try various algorithms >> over a wide range of bandwidths, and then elides that base RTT in all >> future plots. Unless you read carefully, you don't see that... >> >> The original bufferbloat experiment was at a physical RTT of 10ms, >> with bidirectional traffic flooding the queues in both directions, on >> an asymmetric network. I keep waiting for more experimenters to try >> that as a baseline experiment. > > > This sounds like a call for reality, when the point is to measure things that matter to the system you're investigating. Heh. It is my wish, certainly, to see the remy concept extended to solving the problems I consider important. The prospect of merely warming my feet on a nice 80 core box for a week, and evaluating the results, is quite appealing. I'll gladly trade someone else's code, and someone else's compute time, for a vacation (or unemployment, if it works out!). Can arrange for a very hefty cluster to tackle this stuff also, if the code were public. I have encouraged keith to look into the netfpga.org project as a possible target for the rules being created by remy's algorithms. >E.g., if I'm investigating TCP, and I don't specifically work on ACK behavior (ACK congestion control or something like that), I usually don't care much about the backwards traffic or the asymmetry. Yes it does influence the measured RTT a bit, but then you argue for using a smaller base RTT where the impact of this gets smaller too. > > What matters to the function of congestion controllers is the BDP, and the buffer size. Which varies based on the real physical RTT to the server. One of the biggest problems in nets today is that larger flow sources (CDNS, HAS(netflix)) keep moving closer and closer to the end node, with all the problems with tcp fairness that short RTTs induce on longer ones, like your radio flow blow. >As for real RTTs, I like pinging wwoz.org because it's a radio station in New Orleans. I do sometimes listen to it, then I get traffic via a TCP connection that has this RTT. Very real. My ping delay is consistently above 110ms. I too use a low rate radio stream to measure the impact of other flows on my listening experience. I'd written this before the bufferbloat project got rolling, before I'd even met jim. http://nex-6.taht.net/posts/Did_Bufferbloat_Kill_My_Net_Radio/ ... haven't had a problem for 2 years now. :) Still do use the stream ripping thing for taking stuff on the road. > > On a side note, I can't help but mention that the "original bufferbloat experiment" features ping over FQ... measuring ping all by itself, pretty much :-) Huh? The original bufferbloat experiments were against a comcast cable modem, and various other devices, lacking FQ entirely. there was a youtube video, a paper, (things like "daddy, why is the internet slow today" and various other resources. So the earliest stuff was "upload + ping + complaints about the network from the kids", and the later simultaneous up+down+ping abstraction was to get the kids out of the equation. I agree that up+down+ping is not as useful as we'd like now that FQ is on the table. I have been attempting to formalize where we are now, adding rigor based on those early benchmarks, with both netperf-wrapper's tcp_bidirectional test, the more advanced rrul and rtt_fairness tests, and a model in ns3. Given how much bloat we've killed elsewhere I would like to continue to improve these tests to properly show interactions with truly isochronous streams (there are several tests in netperf-wrapper now that leverage d-itg for that now (thx toke!), and more bursty yet rate limited flows like videoconferencing which we've been discussing the structure of with various members of the rmcat group. My complaint comes from seeing very low bandwidths and long rtts and short queue lengths used in materials used to explain bufferbloat, like this: https://www.udacity.com/wiki/cn/assignment5-buffer-bloat when the reality: 1) There is a double bump you see in tcp for example at truly excessive queue lengths). You also see truly terrible ramp up performance when competing data and ack flows. 2) the example assumes byte to packet length equivalence (100 packets = 150kb) which is completely untrue - 100 packets is in the range of 6.4kb-150kb - and a reason why we have much larger packet based queue lengths in the real world IS because of optimizing for acks, not data in commonly deployed equipment. If you want to test a byte based queue length, by all means (I have gradually come to the opinion that byte based packet queues are best), but test either packet or byte based queues under conditions that are ack dominated or data dominated, with both, preferably all scenarios. 3) it is at 1.5mbit, not the 5,10,100mbit speeds. Below 10mbits you see IW10 or IW4 effects, you see ssthresh not hit, and a variety of other tcp specific issues, rather than the generic bufferbloat problem. 4) the 20ms in that course is actually reasonable. :) I don't understand what the harm would have been to the student to use 1000 packets, and test up, down, and up + down, at 10 or 100Mbit. 5) I'd have liked it if the final experiment in the above course material switched to huge queue lengths (say for example, cisco's 2800 default for their 4500 line), or explained the real world situation better, so the takeaway was closer to the real size and scope of the problem. Anyway... Nearly every paper I've read make one or more of the above modifications to what I'd viewed as the original experiment, and I'd like to add rigor to the definitions of phrases floating around like "bufferbloat episode", and so on. > Michael > -- Dave Täht NSFW: https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0296_indecent.article ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bloat] Remy: Computer-Generated Congestion Control 2014-08-21 17:04 ` Dave Taht @ 2014-08-21 17:33 ` Jonathan Morton 2014-08-21 19:50 ` Tim Upthegrove ` (2 more replies) 2014-08-21 19:46 ` Michael Welzl 1 sibling, 3 replies; 9+ messages in thread From: Jonathan Morton @ 2014-08-21 17:33 UTC (permalink / raw) To: Dave Taht; +Cc: bloat [-- Attachment #1: Type: text/plain, Size: 656 bytes --] I don't suppose anyone has set up a lab containing several hundred wireless clients and a number of APs? A stepping stone towards that would be a railway carriage simulator, with one AP, a simulated 3G uplink, and a couple of dozen clients. I wonder how well simply putting fq on each of the clients and fq_codel on the APs would work. My general impression is that fq is the right default choice for end hosts (which are generally not at the bottleneck) and fq_codel is the right default choice for bottleneck routers. A typical consumer router/AP might see the bottleneck for both directions, though not necessarily at the same time. - Jonathan Morton [-- Attachment #2: Type: text/html, Size: 723 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bloat] Remy: Computer-Generated Congestion Control 2014-08-21 17:33 ` Jonathan Morton @ 2014-08-21 19:50 ` Tim Upthegrove 2014-08-21 19:50 ` Alex Burr 2014-08-21 20:57 ` Isaac Konikoff 2 siblings, 0 replies; 9+ messages in thread From: Tim Upthegrove @ 2014-08-21 19:50 UTC (permalink / raw) To: Jonathan Morton; +Cc: bloat [-- Attachment #1: Type: text/plain, Size: 505 bytes --] Hi Jonathan, On Thu, Aug 21, 2014 at 1:33 PM, Jonathan Morton <chromatix99@gmail.com> wrote: > I don't suppose anyone has set up a lab containing several hundred > wireless clients and a number of APs? A stepping stone towards that would > be a railway carriage simulator, with one AP, a simulated 3G uplink, and a > couple of dozen clients. > I think the orbit test bed fits at least the first part of your description. Not so sure about the second... https://www.orbit-lab.org/ -- Tim Upthegrove [-- Attachment #2: Type: text/html, Size: 997 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bloat] Remy: Computer-Generated Congestion Control 2014-08-21 17:33 ` Jonathan Morton 2014-08-21 19:50 ` Tim Upthegrove @ 2014-08-21 19:50 ` Alex Burr 2014-08-21 20:57 ` Isaac Konikoff 2 siblings, 0 replies; 9+ messages in thread From: Alex Burr @ 2014-08-21 19:50 UTC (permalink / raw) To: Jonathan Morton, Dave Taht; +Cc: bloat On Thursday, August 21, 2014 6:33 PM, Jonathan Morton <chromatix99@gmail.com> wrote: > > >I don't suppose anyone has set up a lab containing several hundred wireless clients and a number of APs? A stepping stone towards that would be a railway carriage simulator, with one AP, a simulated 3G uplink, and a couple of dozen clients. > UNH claims to have 1000s of devices; I don't know if they have them set up to run all at once, though: https://www.iol.unh.edu/services/testing/wireless/ That's their commercial arm, but I assume that their CS dept has access to it for doing research. Alex ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bloat] Remy: Computer-Generated Congestion Control 2014-08-21 17:33 ` Jonathan Morton 2014-08-21 19:50 ` Tim Upthegrove 2014-08-21 19:50 ` Alex Burr @ 2014-08-21 20:57 ` Isaac Konikoff 2 siblings, 0 replies; 9+ messages in thread From: Isaac Konikoff @ 2014-08-21 20:57 UTC (permalink / raw) To: Jonathan Morton, Dave Taht; +Cc: bloat We regularly test out our WiFi Traffic Generator against a slew of commercially available APs: Netgear, Cisco, Asus, Dlink...etc.. For 802.11a/b/g/n we have a 1200 station emulator: http://candelatech.com/ct525-1200-6n_product.php For 802.11ac we have a 384 station emulator: http://candelatech.com/ct525-384-6ac_product.php And we have various combinations of the above: http://candelatech.com/lf_systems.php#wifire Here are some sample capacity reports against a few different APs: http://candelatech.com/downloads/wifi-capacity-reports-08152014.tar.gz Isaac On 08/21/2014 10:33 AM, Jonathan Morton wrote: > I don't suppose anyone has set up a lab containing several hundred > wireless clients and a number of APs? A stepping stone towards that > would be a railway carriage simulator, with one AP, a simulated 3G > uplink, and a couple of dozen clients. > > I wonder how well simply putting fq on each of the clients and fq_codel > on the APs would work. My general impression is that fq is the right > default choice for end hosts (which are generally not at the bottleneck) > and fq_codel is the right default choice for bottleneck routers. A > typical consumer router/AP might see the bottleneck for both directions, > though not necessarily at the same time. > > - Jonathan Morton > > > > _______________________________________________ > Bloat mailing list > Bloat@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/bloat > -- Isaac Konikoff Candela Technologies konikofi@candelatech.com Office: +1 360 380 1618 Cell: +1 360 389 2453 Fax: +1 360 380 1431 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Bloat] Remy: Computer-Generated Congestion Control 2014-08-21 17:04 ` Dave Taht 2014-08-21 17:33 ` Jonathan Morton @ 2014-08-21 19:46 ` Michael Welzl 1 sibling, 0 replies; 9+ messages in thread From: Michael Welzl @ 2014-08-21 19:46 UTC (permalink / raw) To: Dave Taht; +Cc: bloat [-- Attachment #1: Type: text/plain, Size: 6929 bytes --] On 21. aug. 2014, at 19:04, Dave Taht <dave.taht@gmail.com> wrote: > On Thu, Aug 21, 2014 at 5:21 AM, Michael Welzl <michawe@ifi.uio.no> wrote: >> Dave, >> >> About this point that I've seen you make repeatedly: >> >>> My biggest problem with all the work so far is that it starts with a >>> constant baseline 150ms or 100ms RTT, and then try various algorithms >>> over a wide range of bandwidths, and then elides that base RTT in all >>> future plots. Unless you read carefully, you don't see that... >>> >>> The original bufferbloat experiment was at a physical RTT of 10ms, >>> with bidirectional traffic flooding the queues in both directions, on >>> an asymmetric network. I keep waiting for more experimenters to try >>> that as a baseline experiment. >> >> >> This sounds like a call for reality, when the point is to measure things that matter to the system you're investigating. > > Heh. It is my wish, certainly, to see the remy concept extended to > solving the problems I consider important. The prospect of merely > warming my feet on a nice 80 core box for a week, and evaluating the > results, is quite appealing. > > I'll gladly trade someone else's code, and someone else's compute > time, for a vacation (or unemployment, if it works out!). Can arrange > for a very hefty cluster to tackle this stuff also, if the code were > public. > > I have encouraged keith to look into the netfpga.org project as a > possible target for the rules being created by remy's algorithms. > >> E.g., if I'm investigating TCP, and I don't specifically work on ACK behavior (ACK congestion control or something like that), I usually don't care much about the backwards traffic or the asymmetry. Yes it does influence the measured RTT a bit, but then you argue for using a smaller base RTT where the impact of this gets smaller too. >> >> What matters to the function of congestion controllers is the BDP, and the buffer size. > > Which varies based on the real physical RTT to the server. One of the > biggest problems in > nets today is that larger flow sources (CDNS, HAS(netflix)) keep > moving closer and closer to the > end node, with all the problems with tcp fairness that short RTTs > induce on longer ones, like your radio > flow blow. Sure >> As for real RTTs, I like pinging wwoz.org because it's a radio station in New Orleans. I do sometimes listen to it, then I get traffic via a TCP connection that has this RTT. Very real. My ping delay is consistently above 110ms. > > I too use a low rate radio stream to measure the impact of other flows > on my listening experience. > > I'd written this before the bufferbloat project got rolling, before > I'd even met jim. > > http://nex-6.taht.net/posts/Did_Bufferbloat_Kill_My_Net_Radio/ > > ... haven't had a problem for 2 years now. :) > > Still do use the stream ripping thing for taking stuff on the road. > >> >> On a side note, I can't help but mention that the "original bufferbloat experiment" features ping over FQ... measuring ping all by itself, pretty much :-) > > Huh? The original bufferbloat experiments were against a comcast cable > modem, and various other devices, lacking FQ entirely. there was a > youtube video, a paper, (things like "daddy, why is the internet slow > today" and various other resources. So the earliest stuff was "upload > + ping + complaints about the network from the kids", and the later > simultaneous up+down+ping abstraction was to get the kids out of the > equation. Oops, sorry. I referred to something I'd seen presented so often that I thought this must be the "original experiment". Of course it isn't, FQ_CoDel wasn't there from day 1 :) apologies > I agree that up+down+ping is not as useful as we'd like now that FQ is > on the table. > > I have been attempting to formalize where we are now, adding rigor > based on those early benchmarks, with both netperf-wrapper's > tcp_bidirectional test, the more advanced rrul and rtt_fairness tests, > and a model in ns3. Given how much bloat we've killed elsewhere I > would like to continue to improve these tests to properly show > interactions with truly isochronous streams (there are several tests > in netperf-wrapper now that leverage d-itg for that now (thx toke!), > and more bursty yet rate limited flows like videoconferencing which > we've been discussing the structure of with various members of the > rmcat group. > > My complaint comes from seeing very low bandwidths and long rtts and > short queue lengths used in materials used to explain bufferbloat, > like this: Short queue lengths is of course odd. Low bandwidth is sometimes convenient to isolate an effect, as it makes it less likely for the CPU, rwnd, lack of window scaling, or whatever else to be your bottleneck. > https://www.udacity.com/wiki/cn/assignment5-buffer-bloat > > when the reality: > > 1) There is a double bump you see in tcp for example at truly > excessive queue lengths). You also see truly > terrible ramp up performance when competing data and ack flows. > > 2) the example assumes byte to packet length equivalence (100 packets > = 150kb) which is completely untrue - > 100 packets is in the range of 6.4kb-150kb - and a reason why we have > much larger packet based queue lengths in the real world IS because of > optimizing for acks, not data in commonly deployed equipment. > > If you want to test a byte based queue length, by all means (I have > gradually come to the opinion that byte based packet queues are best), > but test either packet or byte based queues under conditions that are > ack dominated or data dominated, with both, preferably all scenarios. > > 3) it is at 1.5mbit, not the 5,10,100mbit speeds. Below 10mbits you > see IW10 or IW4 effects, you see ssthresh not hit, and a variety of > other tcp specific issues, rather than the generic bufferbloat > problem. > > 4) the 20ms in that course is actually reasonable. :) > > I don't understand what the harm would have been to the student to use > 1000 packets, and test up, down, and up + down, at 10 or 100Mbit. > > 5) I'd have liked it if the final experiment in the above course > material switched to huge queue lengths (say for example, cisco's 2800 > default for their 4500 line), or explained the real world situation > better, so the takeaway was closer to the real size and scope of the > problem. > > Anyway... > > Nearly every paper I've read make one or more of the above > modifications to what I'd viewed as the original experiment, and I'd > like to add rigor to the definitions of phrases floating around like > "bufferbloat episode", and so on. > > >> Michael >> > > > > -- > Dave Täht > > NSFW: https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0296_indecent.article [-- Attachment #2: Type: text/html, Size: 9569 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2014-08-21 20:57 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-08-04 12:44 [Bloat] Remy: Computer-Generated Congestion Control Baptiste Jonglez 2014-08-04 14:55 ` Dave Taht 2014-08-21 10:21 ` Michael Welzl 2014-08-21 17:04 ` Dave Taht 2014-08-21 17:33 ` Jonathan Morton 2014-08-21 19:50 ` Tim Upthegrove 2014-08-21 19:50 ` Alex Burr 2014-08-21 20:57 ` Isaac Konikoff 2014-08-21 19:46 ` Michael Welzl
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox