[Make-wifi-fast] Wifi Memory limits in small platforms

Thu Aug 22 13:03:24 EDT 2019

Sebastian Gottschall <s.gottschall at newmedia-net.de> writes:

> Am 22.08.2019 um 15:15 schrieb Dave Taht:
>> It's very good to know how much folk have been struggling to keep
>> things from OOMing on 32MB platforms. I'd like to hope that the
>> unified memory management in cake (vs a collection of QoS qdiscs) and
>> the new fq_codel for wifi stuff (cutting it down to 1 alloc from four)
>> help, massively on this issue, but until today I was unaware of how
>> much the field may have been patching things out.
>>
>> The default 32MB memory limits in fq_codel comes from the stressing
>> about 10GigE networking from google. 4MB is limit in openwrt,
>> which is suitable for ~1Gbit, and is sort of there  due to 802.11ac's
>> maximum (impossible to hit) of a txop that large.

I did kind of conflate "qos + fq_codel" vs wifi in this message. It
looks like yer staying with me. 

>> Something as small as 256K is essentially about 128 full size packets
>> (and often, acks from an ethernet device's rx ring eat 2k).
>
> what i miss in mac80211 is the following option "fq_codel = off"
> its essential and i will definitly work on a patch to deal with this
> way for low memory 802.11n platforms.

Well, it would be my hope that turning it off would A) not help that
much on memory or cpu and B) show such a dramatic reduction in
multi-station performance that you'd immediately turn it on again.

I try to encourage folk to run the rtt_fair tests in flent when
twiddling with wifi. Those really shows how bad things are when you
don't have ATF + FQ + Per station aggregation and lots of
clients. Single threaded tests are misleading.

I gave a good demo of why this is (was!), here: https://www.youtube.com/watch?v=Rb-UnHDw02o&t=1551s

and there's more in the ending the anomaly paper. Perversely though,
now that we can do 25x latency reductions and 2.5x more throughput,
more memory is needed to achieve those goals in some cases, which
is part of my concern about chopping things down to 256k here.

>
>>
>> The structure of the new fq_codel for wifi subsystem is "one in the
>> hardware, one ready to go, and the rest accumulating". I
>> typically see about 13-20 packets in an aggregate. 256k strikes me as
>> a bit small.
> from the rules its that 256 is used for ht only and if vht is involved
> the limit of 4mb is used.
> but now comes the point. all 802.11ac platforms having 64mb ram or
> more. but ath10k chipsets are using
> about 40 mb of shared memory. so mmh we are hitting the wall
> again. most routers have 128 mb with 802.11ac, but some (noticable
> dlink) have just 64mb

Ugh.

Is it just the mips boxes with so little ram? All the arm routers I have
have at least 128, some as much as 512.

Yes, having a wifi chip that can theoretically have 4MB in transit
with so little ram is problematic.

Dear dlink: don't do that. It hurts when you do that.

>>
>> I haven't checked, but does this patch still exist in openwrt/dd-wrt?
>> It had helped a lot when under memory pressure from
>> a lot of small packets.
>>
>> https://github.com/dtaht/cerowrt-3.10/blob/master/target/linux/generic/patches-3.10/657-qdisc_reduce_truesize.patch
>>
>> Arguably this could be made more aggressive, but it massively reduced
>> memory burdens at the time I did it when
>> flooding the device, or having lots of acks, and while it cost cpu it
>> saved on ooming.
> mmh let me check -> nope its at least not in my tree. but will be soon :-)

Well, I sent along a mildly improved version of the idea.

I can really see some sort of "test my qos" script that attempts
to flood every queue on the system. And wider adoption of
cake which is lighter weight than the alterntives.

one idea that's in cake was that: we'd hoped to capture the most typical
qos setups with it with "models". It's very easy to add a new model
(besteffort, diffserv3, diffserv4) (it's a lookup table and bandwidth
allocation call), but lacking feedback on more typical QoS constructs
from the field, that's where it ended. When we started the project,
I figured we'd end up with 20+ models before the end.

It would be good to get a tc class dump or output from more typical
QoS Setups.

In sqm and cake...
we have a terrible tendency to tell people "no, just use the defaults!
they work! trust us!"... 

who generally don't believe us and want to keep doing things the
way they always have.

In more than a few circumstances they are right, but we don't understand
what they are trying to do.

As one case that cake doesn't handle, at least some iptv setups are
visible as a strict priority queue over everything else, below which you
do everything else, so the tv stream never, ever, drops a packet.

We didn't do that, but could *easily* add an iptv model to shape
inbound better - if we knew more about how free, FT etc, construct
their packets.

Similarly some folk in this world want strict priority for EF.

>> There's two other dubious things in the fq_codel for wifi stack
>> presently. Right now the codel target is set too high for p2p use
>> (20ms, where 6ms seems more right), and it also flips up to a really
>> high target and interval AND turns off ecn when there's more than a
>> few stations available (rather than "active") - it's an overly
>> conservative figure we used back when we had major issues with
>> powersave
>> and multicast that I'd hoped we could cut back to normal after we got
>> another round of research funding and feedback from the field (which
>> didn't happen, and we never got around to making it configurable, and
>> being 25x better than it was before seemed "enough")
>>
>> I was puzzled at battlemesh as to why I had dropping at about 50ms
>> delay rather than ecn, and thought it was something
>> else, and this morning I'm thinking that folk have been reducing the
>> memlimit to 256k rather....
>>