[Make-wifi-fast] [Bloat] the future belongs to pacing

Sebastian Moeller moeller0 at gmx.de
Sun Jul 5 13:29:47 EDT 2020


Hi Matt,


> On Jul 5, 2020, at 19:07, Matt Mathis <mattmathis at google.com> wrote:
> 
> The consensus in the standards community is that 3168 ECN is not so useful - too late to protect small queues, too much signal (gain) to use it to hint at future congestion.  

	I follow the discussion in the tsw working group and believe I have a good overview of the state of the discussion. I also have gathered some experience in the bufferbloat effort to be able to realize that the L4S proposal is mostly based on wishful thinking than on solid engineering. But yes, the time seems ripe for 1/p-type congestion signaling, but how to do this seems an open question.


>  The point of non-3168 ECN is to permit earlier gentle signalling.   I am not following the ECN conversation, but as stated at recent IETFs, the ECN code in BBRv2 is really a placeholder, and when the ECN community comes to consensus on a standard, I would expect BBR to do the standard.

	I respectfully argue that this is the wrong way around, first implement the current RFC standard aka rfc3168 and only if there is a new standard switch over to that. ATM BBRv seems to bank on the L4S proposals to sail through the IETF completely ignoring the lack of critical testing the L4S design has been treated to.


> 
> Tor has its own special challenge with traffic management.  

	Sorry, TOR was intended to expand to Top Of Rack, which I assumed to be a common way to call the devices that house "the most expensive silicon in the data center, the switch buffer memory". I apologize for not being clear.


> Easy solutions leak information, secure solutions are very hard.   Remember to be useful, the ECN bits need to be in the clear.

	All good points, thanks, but more applicable to the onion router and to top of rack switches...

Best Regards
	Sebastian

> 
> Thanks,
> --MM--
> The best way to predict the future is to create it.  - Alan Kay
> 
> We must not tolerate intolerance;
>        however our response must be carefully measured: 
>             too strong would be hypocritical and risks spiraling out of control;
>             too weak risks being mistaken for tacit approval.
> 
> 
> On Sun, Jul 5, 2020 at 5:01 AM Sebastian Moeller <moeller0 at gmx.de> wrote:
> Hi Matt,
> 
> 
> 
> > On Jul 5, 2020, at 08:10, Matt Mathis <mattmathis at google.com> wrote:
> > 
> > I strongly suggest that people (re)read VJ88 - I do every couple of years, and still discover things that I overlooked on previous readings.
> 
>         I promise to read it. And before I give the wrong impression and for what it is worth*, I consider BBR (even v1) an interesting and important evolutionary step and agree that "pacing" is a gentler approach then bursting a full CWN into a link.
> 
> 
> > 
> > All of the negative comments about BBR and loss, ECN marks,
> 
>         As far as I can tell, BBRv2 aims for a decidedly non-rfc3168 response to CE-marks. This IMHO is not a clear cut case of meaningfully addressing my ECN comment. In the light of efficiently using TOR? switch buffers efficiently, that kind of response might be defensible but it does not really address my remark about it being unfortunate that BBR ignores both immediate signals of congestion, (sparse) packet drops AND explicit CE marks, the proposed (dctcp-like) CE-response seems rather weak compared to the naive expectation of halving/80%-ing of the sending rate, no? BBRv2 as I understand it will happily run roughshod over any true rfc3168 AQM on the path, I do not have the numbers, but I am not fully convinced that typically the most significant throttling on a CDN to end-user path happens still inside the CDN's data center... 
> 
> 
> > or unfairness to cubic were correct for BBRv1 but have been addressed in BBRv2.
> 
>         I am not sure that unfairness was brought up as an issue in this thread.
> 
> 
> > 
> > My paper has a synopsis of BBR, which is intended to get people started.   See the references in the paper for more info:
> 
>         I will have a look at these as well... Thanks
> 
> Best Regards
>         Sebastian
> 
> *) Being from outside the field, probably not much...
> 
> > 
> > [12] Neal Cardwell, Yuchung Cheng, C. Stephen Gunn, Soheil Hassas Yeganeh, and Van Jacobson. 2016. BBR: Congestion-Based Congestion Control. Queue 14, 5, Pages 50 (October 2016). DOI: https://doi.org/10.1145/3012426.3022184
> > [13] Neal Cardwell, Yuchung Cheng, C. Stephen Gunn, Soheil Hassas Yeganeh, and Van Jacobson. 2017. BBR: Congestion-Based Congestion Control. Commun. ACM 60, 2 (January 2017), 58-66. DOI: https://doi.org/10.1145/3009824
> > [22] google/bbr. 2019. GitHub repository, retrieved https://github.com/google/bbr
> > 
> > Key definitions: self clocked: data is triggered by ACKs.  All screwy packet and ACK scheduling in the network is reflected back into the network on the next RTT.
> > 
> > Paced: data is transmitted on a timer, independent of ACK arrivals (as long as the ACKs take less than twice the measured minRTT).  Thus in bulk transport there is little or no correlation between data transmissions and events elsewhere in the network. 
> > 
> > Clarification about my earlier WiFi comment:  The BBRv1 WiFi fix missed 4.19 LTS, so bad results are "expected" for many distros.  If you want to do useful experiments, you must read https://groups.google.com/g/bbr-dev/ and start from BBRv2 in [22].
> > 
> > Thanks,
> > --MM--
> > The best way to predict the future is to create it.  - Alan Kay
> > 
> > We must not tolerate intolerance;
> >        however our response must be carefully measured: 
> >             too strong would be hypocritical and risks spiraling out of control;
> >             too weak risks being mistaken for tacit approval.
> > 
> > 
> > On Sat, Jul 4, 2020 at 11:29 AM Sebastian Moeller <moeller0 at gmx.de> wrote:
> > 
> > 
> > > On Jul 4, 2020, at 19:52, Daniel Sterling <sterling.daniel at gmail.com> wrote:
> > > 
> > > On Sat, Jul 4, 2020 at 1:29 PM Matt Mathis via Bloat
> > > <bloat at lists.bufferbloat.net> wrote:
> > > "pacing is inevitable, because it saves large content providers money
> > > (more efficient use of the most expensive silicon in the data center,
> > > the switch buffer memory), however to use pacing we walk away from 30
> > > years of experience with TCP self clock"
> > > 
> > > at the risk of asking w/o doing any research,
> > > 
> > > could someone explain this to a lay person or point to a doc talking
> > > about this more?
> > > 
> > > What does BBR do that's different from other algorithms?
> > 
> >         Well, it does not believe the network (blindly), that is currently it ignores both ECN marks and (sparse) drops as signs of congestion, instead it uses its own rate estimates to set its send rate and cyclically will re-assess its rate estimate. Sufficiently severe drops will be honored. IMHO a somewhat risky approach, that works reasonably well, as often sparse drops are not real signs of congestion but just random drops of say a wifi link (that said, these drops on wifi typically also cause painful latency spikes as wifi often takes heroic measures in attempting retransmitting for several 100s of milliseconds).
> > 
> > 
> > > Why does it
> > > break the clock?
> > 
> >         One can argue that there is no real clock to break. TCP gates the release on new packets on the reception of ACK signals from the receiver, this is only a clock, if one does not really care for the equi-temporal period property of a real clock. But for better or worse that is the term that is used. IMHO (and I really am calling this from way out in the left-field) gating would be a better term, but changing the nomenclature probably is not an option at this point.
> > 
> > > Before BBR, was the clock the only way TCP did CC?
> > 
> >         No, TCP also interpreted a drop (or rather 3 duplicated ACKs) as signal of congestion and hit the brakes, by halving the congestion window (the amount of data that could be in flight unacknowledged, which roughly correlates with the send rate, if averaged over long enough time windows). BBR explicitly does not do this unless it really is convinced that someone dropped multiple packets purposefully to signal congestion.
> >         In practice it works rather well, in theory it could do with at least an rfc3168 compliant response to ECN marks (which an AQM uses to explicitly signal congestion, unlike a drop an ECN mark is really unambiguous, some hop on the way "told" the flow slow down).
> > 
> > 
> > > 
> > > Also,
> > > 
> > > I have UBNT "Amplifi" HD wifi units in my house. (HD units only; none
> > > of the "mesh" units. Just HD units connected either via wifi or
> > > wired.) Empirically, I've found that in order to reduce latency, I
> > > need to set cake to about 1/4 of the total possible wifi speed;
> > > otherwise if a large download comes down from my internet link, that
> > > flow causes latency.
> > > 
> > > That is, if I'm using 5ghz at 20mhz channel width, I need to set
> > > cake's bandwidth argument to 40mbits to prevent video streams /
> > > downloads from impacting latency for any other stream. This is w/o any
> > > categorization at all; no packet marking based on port or anything
> > > else; cake set to "best effort".
> > > 
> > > Anything higher and when a large amount of data comes thru, something
> > > (assumedly the buffer in the Amplifi HD units) causes 100s of
> > > milliseconds of latency.
> > > 
> > > Can anyone speak to how BBR would react to this? My ISP is full
> > > gigabit; but cake is going to drop a lot of packets as it throttles
> > > that down to 40mbit before it sends the packets to the wifi AP.
> > > 
> > > Thanks,
> > > Dan
> > > _______________________________________________
> > > Bloat mailing list
> > > Bloat at lists.bufferbloat.net
> > > https://lists.bufferbloat.net/listinfo/bloat
> > 
> 



More information about the Make-wifi-fast mailing list