Also, I haven't done it but I don't think rate limiting TCP will solve this aggregation "problem." The faster RTT is driving CWND much below the maximum aggregation, i.e. CWND is too small relative to wi-fi aggregation. Bob On Fri, May 13, 2016 at 11:05 AM, Bob McMahon wrote: > The graphs are histograms of mpdu/ampdu, from 1 to 64. The blue spikes > show that the vast majority of traffic is filling an ampdu with 64 mpdus. > The fill stop reason is ampdu full. The purple fill stop reasons are that > the sw fifo (above the driver) went empty indicating a too small CWND for > maximum aggregation. A driver wants to aggregate to the fullest extent > possible. A work around is to set initcwnd in the router table. > > I don't have the data available for multiple flows at the moment. Note: > That will depend on what exactly defines a flow. > > Bob > > On Fri, May 13, 2016 at 10:49 AM, Dave Taht wrote: > >> I try to stress that single tcp flows should never use all the bandwidth >> for the sawtooth to function properly. >> >> What happens when you hit it with 4 flows? or 12? >> >> nice graph, but I don't understand the single blue spikes? >> >> On Fri, May 13, 2016 at 10:46 AM, Bob McMahon >> wrote: >> >>> On driver delays, from a driver development perspective the problem >>> isn't to add delay or not (it shouldn't) it's that the TCP stack isn't >>> presenting sufficient data to fully utilize aggregation. Below is a >>> histogram comparing aggregations of 3 systems (units are mpdu per ampdu.) >>> The lowest latency stack is in purple and it's also the worst performance >>> with respect to average throughput. From a driver perspective, one would >>> like TCP to present sufficient bytes into the pipe that the histogram leans >>> toward the blue. >>> >>> [image: Inline image 1] >>> I'm not an expert on TCP near congestion avoidance but maybe the >>> algorithm could benefit from RTT as weighted by CWND (or bytes in flight) >>> and hunt that maximum? >>> >>> Bob >>> >>> On Mon, May 9, 2016 at 8:41 PM, David Lang wrote: >>> >>>> On Mon, 9 May 2016, Dave Taht wrote: >>>> >>>> On Mon, May 9, 2016 at 7:25 PM, Jonathan Morton >>>>> wrote: >>>>> >>>>>> >>>>>> On 9 May, 2016, at 18:35, Dave Taht wrote: >>>>>>> >>>>>>> should we always wait a little bit to see if we can form an >>>>>>> aggregate? >>>>>>> >>>>>> >>>>>> I thought the consensus on this front was “no”, as long as we’re >>>>>> making the decision when we have an immediate transmit opportunity. >>>>>> >>>>> >>>>> I think it is more nuanced than how david lang has presented it. >>>>> >>>> >>>> I have four reasons for arguing for no speculative delays. >>>> >>>> 1. airtime that isn't used can't be saved. >>>> >>>> 2. lower best-case latency >>>> >>>> 3. simpler code >>>> >>>> 4. clean, and gradual service degredation under load. >>>> >>>> the arguments against are: >>>> >>>> 5. throughput per ms of transmit time is better if aggregation happens >>>> than if it doesn't. >>>> >>>> 6. if you don't transmit, some other station may choose to before you >>>> would have finished. >>>> >>>> #2 is obvious, but with the caviot that anytime you transmit you may be >>>> delaying someone else. >>>> >>>> #1 and #6 are flip sides of each other. we want _someone_ to use the >>>> airtime, the question is who. >>>> >>>> #3 and #4 are closely related. >>>> >>>> If you follow my approach (transmit immediately if you can, aggregate >>>> when you have a queue), the code really has one mode (plus queuing). "If >>>> you have a Transmit Oppertunity, transmit up to X packets from the queue", >>>> and it doesn't matter if it's only one packet. >>>> >>>> If you delay the first packet to give you a chance to aggregate it with >>>> others, you add in the complexity and overhead of timers (including >>>> cancelling timers, slippage in timers, etc) and you add "first packet, >>>> start timers" mode to deal with. >>>> >>>> I grant you that the first approach will "saturate" the airtime at >>>> lower traffic levels, but at that point all the stations will start >>>> aggregating the minimum amount needed to keep the air saturated, while >>>> still minimizing latency. >>>> >>>> I then expect that application related optimizations would then further >>>> complicate the second approach. there are just too many cases where small >>>> amounts of data have to be sent and other things serialize behind them. >>>> >>>> DNS lookup to find a domain to then to a 3-way handshake to then do a >>>> request to see if the library has been updated since last >>>> cached (repeat for several libraries) to then fetch the actual page >>>> content. All of these thing up to the actual page content could be single >>>> packets that have to be sent (and responded to with a single packet), >>>> waiting for the prior one to complete. If you add a few ms to each of >>>> these, you can easily hit 100ms in added latency. Once you start to try and >>>> special cases these sorts of things, the code complexity multiplies. >>>> >>>> So I believe that the KISS approach ends up with a 'worse is better' >>>> situation. >>>> >>>> David Lang >>>> _______________________________________________ >>>> Make-wifi-fast mailing list >>>> Make-wifi-fast@lists.bufferbloat.net >>>> https://lists.bufferbloat.net/listinfo/make-wifi-fast >>>> >>>> >>> >> >> >> -- >> Dave Täht >> Let's go make home routers and wifi faster! With better software! >> http://blog.cerowrt.org >> > >