[Make-wifi-fast] Diagram of the ath9k TX path

Tue May 10 01:22:22 EDT 2016

On Mon, 9 May 2016, Dave Taht wrote:

> This is a very good overview, thank you. I'd like to take apart
> station behavior on wifi with a web application... as a straw man.
>
> On Mon, May 9, 2016 at 8:41 PM, David Lang <david at lang.hm> wrote:
>> On Mon, 9 May 2016, Dave Taht wrote:
>>
>>> On Mon, May 9, 2016 at 7:25 PM, Jonathan Morton <chromatix99 at gmail.com>
>>> wrote:
>>>>
>>>>
>>>>> On 9 May, 2016, at 18:35, Dave Taht <dave.taht at gmail.com> wrote:
>>>>>
>>>>> should we always wait a little bit to see if we can form an aggregate?
>>>>
>>>>
>>>> I thought the consensus on this front was “no”, as long as we’re making
>>>> the decision when we have an immediate transmit opportunity.
>>>
>>>
>>> I think it is more nuanced than how david lang has presented it.
>>
>>
>> I have four reasons for arguing for no speculative delays.
>>
>> 1. airtime that isn't used can't be saved.
>>
>> 2. lower best-case latency
>>
>> 3. simpler code
>>
>> 4. clean, and gradual service degredation under load.
>>
>> the arguments against are:
>>
>> 5. throughput per ms of transmit time is better if aggregation happens than
>> if it doesn't.
>>
>> 6. if you don't transmit, some other station may choose to before you would
>> have finished.
>>
>> #2 is obvious, but with the caviot that anytime you transmit you may be
>> delaying someone else.
>>
>> #1 and #6 are flip sides of each other. we want _someone_ to use the
>> airtime, the question is who.
>>
>> #3 and #4 are closely related.
>>
>> If you follow my approach (transmit immediately if you can, aggregate when
>> you have a queue), the code really has one mode (plus queuing). "If you have
>> a Transmit Oppertunity, transmit up to X packets from the queue", and it
>> doesn't matter if it's only one packet.
>>
>> If you delay the first packet to give you a chance to aggregate it with
>> others, you add in the complexity and overhead of timers (including
>> cancelling timers, slippage in timers, etc) and you add "first packet, start
>> timers" mode to deal with.
>>
>> I grant you that the first approach will "saturate" the airtime at lower
>> traffic levels, but at that point all the stations will start aggregating
>> the minimum amount needed to keep the air saturated, while still minimizing
>> latency.
>>
>> I then expect that application related optimizations would then further
>> complicate the second approach. there are just too many cases where small
>> amounts of data have to be sent and other things serialize behind them.
>>
>> DNS lookup to find a domain to then to a 3-way handshake to then do a
>> request to see if the <web something> library has been updated since last
>> cached (repeat for several libraries) to then fetch the actual page content.
>> All of these thing up to the actual page content could be single packets
>> that have to be sent (and responded to with a single packet), waiting for
>> the prior one to complete. If you add a few ms to each of these, you can
>> easily hit 100ms in added latency. Once you start to try and special cases
>> these sorts of things, the code complexity multiplies.
>
> Take web page parsing as an example. The first request is a dns
> lookup. The second request is a http get (which can include a few more
> round trips for
> negotiating SSL), the next is a flurry of page parsing that results in
> the internal web browser attempting to schedule it's requests best and
> then sending out the relevant dns and tcp flows as best it can figure
> out, and then, typically several seconds of data transfer across each
> set of flows.

Actually, I think that a lot (if not the majority) of these flows are actually 
short, because the libraries/stylesheets/images/etc are cached recently enough 
that they don't need to be fetched again. The browser just needs to check if the 
copy they have is still good or if it's been changed.

> Page paint is bound by getting the critical portions of the resulting
> data parsed and laid out properly.
>
> Now, I'd really like that early phase to be optimized by APs by
> something more like SQF, where when a station appears and does a few
> packet exchanges that it gets priority over stations taking big flows
> on a more regular basis, so it more rapidly gets into flow balance
> with the other stations.

There are two parts to this process

1. the tactical (do you send the pending packet immediately, or do you delay it 
to see if you can save airtime with aggregation)

2. the strategic (once a queue of pending packets has built up, how do you pick 
which one to send)

what you are talking about is the strategic part of it, where you assume that 
there is a queue of data to be sent, and picking which stuff to send first 
affects the performance.

What I'm talking about is the tactical, before the queue has built, don't add 
time to the flow by delaying packets. Especially because in this case the odds 
are good that there is not going to be anything to aggregate with it.

DNS udp packets aren't going to have anything else to aggregate with.

3-way handshake packets aren't going to have anything else to aggregate with 
(until and unless you are doing them while you have other stuff being 
transmitted, even parallel connections to different servers are likely to be 
spread out due to differences in network distance)

http checks for cache validation are unlikely to have anything to aggregate 
with.

The SSL handshake is a bit more complex, but there's not a lot of data moving in 
either direction at any step, and there are a lot of exchanges.

With 'modern' AJAX sites, even after the entire page is rendered and the 
javascript starts running and fetching data you may have a page retrieve a lot 
of stuff, but with lazy coding, there area lot of requests that retrieve very 
small amounts of data.

Find some nasty sites (complexity wise) and do some sniffs on a nice, 
low-latency wired network and check the number of connections, and the sizes of 
all the packets (and their timing)

artificially add some horrid latency to the connection to exaggerate the 
influence of serialized steps and watch what happens.

David Lang

> (and then, for most use cases, like web, exits)
1>
> the second phase, of actual transfer, is also bound by RTT. I have no
> idea to what extent wifi folk actually put into typical web transfer
> delays (20-80ms),
> but they are there...
>
> ...
>
> The idea of the wifi driver waiting a bit to form a better aggregate
> to fit into a txop ties into two slightly different timings and flow
> behaviors.
>
> If it is taking 10ms to get a txop in the first place, taking more
> time to assemble a good batch of packets to fit into "your" txop would
> be good.

If you are not at a txop, all you can do is queue, so you queue. And when you 
get a txop, you send as much as you can (up to the configured max)

no disagreement there.

if the txop is a predictable minimum distance away (because you know that 
another station just started transmitting and will take 10ms), then you can 
spend more time being fancy about what you send and how you pack it.

>
> If it is taking 4ms to transfer your last txop, well, more packets may
> arrive for you in that interval, and feed into your existing flows to
> keep them going,
> if you defer feeding the hardware with them.

Yes, This strategy ideally is happening as close to the hardware as possible.

> Also, classic tcp acking goes out the window with competing acks at layer 2.
>
> I don't know if quic can do the equivalent of stretch acks...
>
> but one layer 3 ack, block acked by layer 2 in wifi, suffices... if
> you have a ton of tcp acks outstanding, block acking them all is
> expensive...

yes.

>> So I believe that the KISS approach ends up with a 'worse is better'
>> situation.
>
> Code is going to get more complex anyway, and there are other
> optimizations that could be made.

all the more reason to have that complexity on top of a simpler core :-)

> One item I realized recently is that part of codel need not run on
> every packet in every flow for stuff destined to fit into a single
> txop. It is sufficient to see if it declared a drop on the first
> packet in a flow destined for a given txop.
>
> You can then mark that entire flow (in a txop) as droppable (QoSNoAck)
> within that txop (as it is within an RTT, and even losing all the
> packets there will only cause the rate to halve).

I would try to not drop all of them, in case the bitrate drops before you 
re-send (try to avoid having one txop worth of date become several).

David Lang