Also, I haven't done it but I don't think rate limiting TCP will solve this
aggregation "problem."  The faster RTT is driving CWND much below the
maximum aggregation, i.e. CWND is too small relative to wi-fi aggregation.

Bob

On Fri, May 13, 2016 at 11:05 AM, Bob McMahon <bob.mcmahon@broadcom.com>
wrote:

> The graphs are histograms of mpdu/ampdu, from 1 to 64.   The blue spikes
> show that the vast majority of traffic is filling an ampdu with 64 mpdus.
> The fill stop reason is ampdu full.  The purple fill stop reasons are that
> the sw fifo (above the driver) went empty indicating a too small CWND for
> maximum aggregation.  A driver wants to aggregate to the fullest extent
> possible.     A work around is to set initcwnd in the router table.
>
> I don't have the data available for multiple flows at the moment.  Note:
> That will depend on what exactly defines a flow.
>
> Bob
>
> On Fri, May 13, 2016 at 10:49 AM, Dave Taht <dave.taht@gmail.com> wrote:
>
>> I try to stress that single tcp flows should never use all the bandwidth
>> for the sawtooth to function properly.
>>
>> What happens when you hit it with 4 flows? or 12?
>>
>> nice graph, but I don't understand the single blue spikes?
>>
>> On Fri, May 13, 2016 at 10:46 AM, Bob McMahon <bob.mcmahon@broadcom.com>
>> wrote:
>>
>>> On driver delays, from a driver development perspective the problem
>>> isn't to add delay or not (it shouldn't) it's that the TCP stack isn't
>>> presenting sufficient data to fully utilize aggregation.  Below is a
>>> histogram comparing aggregations of 3 systems (units are mpdu per ampdu.)
>>>  The lowest latency stack is in purple and it's also the worst performance
>>> with respect to average throughput.   From a driver perspective, one would
>>> like TCP to present sufficient bytes into the pipe that the histogram leans
>>> toward the blue.
>>>
>>> [image: Inline image 1]
>>> I'm not an expert on TCP near congestion avoidance but maybe the
>>> algorithm could benefit from RTT as weighted by CWND (or bytes in flight)
>>> and hunt that maximum?
>>>
>>> Bob
>>>
>>> On Mon, May 9, 2016 at 8:41 PM, David Lang <david@lang.hm> wrote:
>>>
>>>> On Mon, 9 May 2016, Dave Taht wrote:
>>>>
>>>> On Mon, May 9, 2016 at 7:25 PM, Jonathan Morton <chromatix99@gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> On 9 May, 2016, at 18:35, Dave Taht <dave.taht@gmail.com> wrote:
>>>>>>>
>>>>>>> should we always wait a little bit to see if we can form an
>>>>>>> aggregate?
>>>>>>>
>>>>>>
>>>>>> I thought the consensus on this front was “no”, as long as we’re
>>>>>> making the decision when we have an immediate transmit opportunity.
>>>>>>
>>>>>
>>>>> I think it is more nuanced than how david lang has presented it.
>>>>>
>>>>
>>>> I have four reasons for arguing for no speculative delays.
>>>>
>>>> 1. airtime that isn't used can't be saved.
>>>>
>>>> 2. lower best-case latency
>>>>
>>>> 3. simpler code
>>>>
>>>> 4. clean, and gradual service degredation under load.
>>>>
>>>> the arguments against are:
>>>>
>>>> 5. throughput per ms of transmit time is better if aggregation happens
>>>> than if it doesn't.
>>>>
>>>> 6. if you don't transmit, some other station may choose to before you
>>>> would have finished.
>>>>
>>>> #2 is obvious, but with the caviot that anytime you transmit you may be
>>>> delaying someone else.
>>>>
>>>> #1 and #6 are flip sides of each other. we want _someone_ to use the
>>>> airtime, the question is who.
>>>>
>>>> #3 and #4 are closely related.
>>>>
>>>> If you follow my approach (transmit immediately if you can, aggregate
>>>> when you have a queue), the code really has one mode (plus queuing). "If
>>>> you have a Transmit Oppertunity, transmit up to X packets from the queue",
>>>> and it doesn't matter if it's only one packet.
>>>>
>>>> If you delay the first packet to give you a chance to aggregate it with
>>>> others, you add in the complexity and overhead of timers (including
>>>> cancelling timers, slippage in timers, etc) and you add "first packet,
>>>> start timers" mode to deal with.
>>>>
>>>> I grant you that the first approach will "saturate" the airtime at
>>>> lower traffic levels, but at that point all the stations will start
>>>> aggregating the minimum amount needed to keep the air saturated, while
>>>> still minimizing latency.
>>>>
>>>> I then expect that application related optimizations would then further
>>>> complicate the second approach. there are just too many cases where small
>>>> amounts of data have to be sent and other things serialize behind them.
>>>>
>>>> DNS lookup to find a domain to then to a 3-way handshake to then do a
>>>> request to see if the <web something> library has been updated since last
>>>> cached (repeat for several libraries) to then fetch the actual page
>>>> content. All of these thing up to the actual page content could be single
>>>> packets that have to be sent (and responded to with a single packet),
>>>> waiting for the prior one to complete. If you add a few ms to each of
>>>> these, you can easily hit 100ms in added latency. Once you start to try and
>>>> special cases these sorts of things, the code complexity multiplies.
>>>>
>>>> So I believe that the KISS approach ends up with a 'worse is better'
>>>> situation.
>>>>
>>>> David Lang
>>>> _______________________________________________
>>>> Make-wifi-fast mailing list
>>>> Make-wifi-fast@lists.bufferbloat.net
>>>> https://lists.bufferbloat.net/listinfo/make-wifi-fast
>>>>
>>>>
>>>
>>
>>
>> --
>> Dave Täht
>> Let's go make home routers and wifi faster! With better software!
>> http://blog.cerowrt.org
>>
>
>