More sanely debloating wifi aggregates

Wed Dec 12 04:08:11 EST 2012

It is my intent to start working harder on wifi issues next year.

Regardless of how much I care about fixing APs, the biggest user of
linux based wifi is android, so it makes sense to be hacking on that,
rather than the crappy iwl chip in most of my machines. (I have ath9k
cards, tho)

I'm getting an android tablet for christmas (any recommendations?
Obviously Cyanogenmod is going to have to be supported... What wifi
chipsets are in use?)

Anyway, besides smashing all the extra wifi tx buffering in the stack
and instituting sane drop policies and something fq_codel-like there,
as well as paying attention to station ids and classification and a
few other things important on an AP but not on a client, I came up
with a idea for dealing with rx de-aggregation which seems simpler to
implement initially and will lead towards the tx goal eventually.

Basically, it's adding SFQ to the de-aggregation step in the rx path.

What happens currently is that an entire rx aggregate (up to 42
packets) is decoded, and then dequeued in strict FIFO order, and then
shipped "elsewhere", usually at a speed far higher than the arrival
rate of the wifi link. No queue forms at the egress link as Linux is a
strict pull-through stack, so you can't do any work on the egress side
that is useful. However, that pesky aggregate exists...

To explain the possible advantage of SFQ'ing the aggregate before it
is delivered elsewhere, I'll use an example.

1 big flow, 1 small flow, 1 ping and 1 DNS packet arrive via an
aggregate in that order. The 30 packets of the big flow are dequeued
first, and shipped to the local TCP server, which responds immediately
with a ton of large packets, scaling up according to slow start or
whatever phase of the TCP algorithm it's in. The small flow gets 10
packets out and a ton of packets back. The ping then arrives, and then
the DNS packet. Now the behavior on the receiving side is that it now
builds up a queue that is fairly large, long before the small flow,
ping and dns packet arrive, so they are starved to share the link, and
multiple aggregates have to be scheduled and shipped long before the
ping arrives. And we're already familiar with the over buffering in
the tx path.

An alternative is SFQ dequeuing the aggregate. Now, 1 packet each from
the 4 flows depart in round robin order. The ping, small flow, big
flow, and DNS packet (with a little lookup latency but hopefully
pretty fast) all manage to get packets out and back, so they can be
scheduled in the next string of tx aggregates.

(Thanks to the rrul test and a zillion benchmarks of wifi under
various scenarios I have a good mental picture of what's happening
today in aggregates, and bidirectional throughput is generally quite
compromised by them be-ing dequeued in fifo order)

Temporarily "sorting" packets in the de-aggregation step will
certainly incur a cpu cost, and a bit of delay, but I think the above
behavior will smooth out client application behavior somewhat and
certainly help on APs. Thoughts?

-- 
Dave Täht

Fixing bufferbloat with cerowrt: http://www.teklibre.com/cerowrt/subscribe.html