[Make-wifi-fast] fq_codel_drop vs a udp flood
dave.taht at gmail.com
Sat Apr 30 23:41:38 EDT 2016
There were a few things on this thread that went by, and I wasn't on
the ath10k list
first up, udp flood...
>>> From: ath10k <ath10k-boun... at lists.infradead.org> on behalf of Roman
>>> Yeryomin <leroi.li... at gmail.com>
>>> Sent: Friday, April 8, 2016 8:14 PM
>>> To: ath10k at lists.infradead.org
>>> Subject: ath10k performance, master branch from 20160407
>>> I've seen performance patches were commited so I've decided to give it
>>> a try (using 4.1 kernel and backports).
>>> The results are quite disappointing: TCP download (client pov) dropped
>>> from 750Mbps to ~550 and UDP shows completely weird behavour - if
>>> generating 900Mbps it gives 30Mbps max, if generating 300Mbps it gives
>>> 250Mbps, before (latest official backports release from January) I was
>>> able to get 900Mbps.
>>> Hardware is basically ap152 + qca988x 3x3.
>>> When running perf top I see that fq_codel_drop eats a lot of cpu.
>>> Here is the output when running iperf3 UDP test:
>>> 45.78% [kernel] [k] fq_codel_drop
>>> 3.05% [kernel] [k] ag71xx_poll
>>> 2.18% [kernel] [k] skb_release_data
>>> 2.01% [kernel] [k] r4k_dma_cache_inv
The udp flood behavior is not "weird". The test is wrong. It is so filling
the local queue as to dramatically exceed the bandwidth on the link.
The size of the local queue has exceeded anything rational, gentle
tcp-friendly methods have failed, we're out of configured queue space,
and as a last ditch move, fq_codel_drop is attempting to reduce the
backlog via brute force.
0) Fix the test
The udp flood test should seek an operating point roughly equal to
the bandwidth of the link, to where there is near zero queuing delay,
and nearly 100% utilization.
There are several well known methods for an endpoint to seek
equilibrium, - filling the pipe and not the queue - notably the ones
outlined in this:
are a good starting point for further research. :)
Now, a unicast flood test is useful for figuring out how many packets
can fit in a link (both large and small), and tweaking the cpu (or
running a box out of memory).
I have seen a lot of udp flood tests that are constructed badly.
Measuring time to *send* X packets without counting the queue length
in the test is one. This was iperf3 what options, exactly? Running
locally or via a test client connected via ethernet? (so at local cpu
speeds, rather than the network ingress speed?)
Simple test of your test: if your udp flood test tool reports a better
result with a 10000 packet local queue than a 1000 packet one, it's
A "Good" udp flood test merely counts the number of *received* packets
and bytes over some (set of) intervals, gradually ramping up until it
sees no further improvements. A better one might shock the system and
try to measure the rate controller or aggregator as well, AND count
and graph packet loss over time, etc.
and then there's side effects like running out of cpu on an artificial
test. Still, in the real world, udp floods exist, and we can rip some
of the cpu cost out of fq_codel drop.
fq_codel_drop looks through 1024 queues in the mainline version and
4096 in this.  That's *expensive*.
1) fq_codel_drop should probably bump up the codel count on every drop
to give the main portion of the algorithm a higher drop frequency,
Won't hurt, but won't help much in the face of a large disparity of
input vs output rates for a fairly long time. A smaller disparity
(like with gigE feeding 800mbit wifi) will naturally have the main
part of the algo kick in sooner.
2) fq_codel_drop can simply taildrop. That would cut the cpu cost by
quite a lot and make the udp flood test easier to "pass".
It does little in the real world to actually shoot at the offending
flow and a serious flood will end up hurting flows behaving sanely.
I favor this option as it is cheap and more or less what happened in
the pre-fq_codeled world. Coupling it with 1 above doesn't quite work
as well as you might want, either, but might help.
3) Steering - you could store the size and ptr to the biggest flow
of all flows and drop from head of that.
Or to give more friendly behavior store the top 3 and circulate between
This incurs an ongoing cpu cost on every queue/dequeue of a packet.
4) Do it more per-station airtime fairness (find the station with the
biggest backlog) and have a smaller number of fq_codel queues per
station. For most purposes, honestly, 64 queues per station sounds
like plenty at the moment.
I am painfully aware we have a long way to go to get this right, but
http://blog.cerowrt.org/post/rtt_fair_on_wifi/ is the endgame for
Let's go make home routers and wifi faster! With better software!
More information about the Make-wifi-fast