[Make-wifi-fast] Diagram of the ath9k TX path

Tue May 10 00:59:19 EDT 2016

This is a very good overview, thank you. I'd like to take apart
station behavior on wifi with a web application... as a straw man.

On Mon, May 9, 2016 at 8:41 PM, David Lang <david at lang.hm> wrote:
> On Mon, 9 May 2016, Dave Taht wrote:
>
>> On Mon, May 9, 2016 at 7:25 PM, Jonathan Morton <chromatix99 at gmail.com>
>> wrote:
>>>
>>>
>>>> On 9 May, 2016, at 18:35, Dave Taht <dave.taht at gmail.com> wrote:
>>>>
>>>> should we always wait a little bit to see if we can form an aggregate?
>>>
>>>
>>> I thought the consensus on this front was “no”, as long as we’re making
>>> the decision when we have an immediate transmit opportunity.
>>
>>
>> I think it is more nuanced than how david lang has presented it.
>
>
> I have four reasons for arguing for no speculative delays.
>
> 1. airtime that isn't used can't be saved.
>
> 2. lower best-case latency
>
> 3. simpler code
>
> 4. clean, and gradual service degredation under load.
>
> the arguments against are:
>
> 5. throughput per ms of transmit time is better if aggregation happens than
> if it doesn't.
>
> 6. if you don't transmit, some other station may choose to before you would
> have finished.
>
> #2 is obvious, but with the caviot that anytime you transmit you may be
> delaying someone else.
>
> #1 and #6 are flip sides of each other. we want _someone_ to use the
> airtime, the question is who.
>
> #3 and #4 are closely related.
>
> If you follow my approach (transmit immediately if you can, aggregate when
> you have a queue), the code really has one mode (plus queuing). "If you have
> a Transmit Oppertunity, transmit up to X packets from the queue", and it
> doesn't matter if it's only one packet.
>
> If you delay the first packet to give you a chance to aggregate it with
> others, you add in the complexity and overhead of timers (including
> cancelling timers, slippage in timers, etc) and you add "first packet, start
> timers" mode to deal with.
>
> I grant you that the first approach will "saturate" the airtime at lower
> traffic levels, but at that point all the stations will start aggregating
> the minimum amount needed to keep the air saturated, while still minimizing
> latency.
>
> I then expect that application related optimizations would then further
> complicate the second approach. there are just too many cases where small
> amounts of data have to be sent and other things serialize behind them.
>
> DNS lookup to find a domain to then to a 3-way handshake to then do a
> request to see if the <web something> library has been updated since last
> cached (repeat for several libraries) to then fetch the actual page content.
> All of these thing up to the actual page content could be single packets
> that have to be sent (and responded to with a single packet), waiting for
> the prior one to complete. If you add a few ms to each of these, you can
> easily hit 100ms in added latency. Once you start to try and special cases
> these sorts of things, the code complexity multiplies.

Take web page parsing as an example. The first request is a dns
lookup. The second request is a http get (which can include a few more
round trips for
negotiating SSL), the next is a flurry of page parsing that results in
the internal web browser attempting to schedule it's requests best and
then sending out the relevant dns and tcp flows as best it can figure
out, and then, typically several seconds of data transfer across each
set of flows.

Page paint is bound by getting the critical portions of the resulting
data parsed and laid out properly.

Now, I'd really like that early phase to be optimized by APs by
something more like SQF, where when a station appears and does a few
packet exchanges that it gets priority over stations taking big flows
on a more regular basis, so it more rapidly gets into flow balance
with the other stations.

(and then, for most use cases, like web, exits)

the second phase, of actual transfer, is also bound by RTT. I have no
idea to what extent wifi folk actually put into typical web transfer
delays (20-80ms),
but they are there...

...

The idea of the wifi driver waiting a bit to form a better aggregate
to fit into a txop ties into two slightly different timings and flow
behaviors.

If it is taking 10ms to get a txop in the first place, taking more
time to assemble a good batch of packets to fit into "your" txop would
be good.

If it is taking 4ms to transfer your last txop, well, more packets may
arrive for you in that interval, and feed into your existing flows to
keep them going,
if you defer feeding the hardware with them.

Also, classic tcp acking goes out the window with competing acks at layer 2.

I don't know if quic can do the equivalent of stretch acks...

but one layer 3 ack, block acked by layer 2 in wifi, suffices... if
you have a ton of tcp acks outstanding, block acking them all is
expensive...

> So I believe that the KISS approach ends up with a 'worse is better'
> situation.

Code is going to get more complex anyway, and there are other
optimizations that could be made.

One item I realized recently is that part of codel need not run on
every packet in every flow for stuff destined to fit into a single
txop. It is sufficient to see if it declared a drop on the first
packet in a flow destined for a given txop.

You can then mark that entire flow (in a txop) as droppable (QoSNoAck)
within that txop (as it is within an RTT, and even losing all the
packets there will only cause the rate to halve).

>
> David Lang

-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org