[Cake] Wifi Memory limits in small platforms

Thu Aug 22 13:37:08 EDT 2019

Am 22.08.2019 um 19:03 schrieb Dave Taht:
> Sebastian Gottschall <s.gottschall at newmedia-net.de> writes:
>
>> Am 22.08.2019 um 15:15 schrieb Dave Taht:
>>> It's very good to know how much folk have been struggling to keep
>>> things from OOMing on 32MB platforms. I'd like to hope that the
>>> unified memory management in cake (vs a collection of QoS qdiscs) and
>>> the new fq_codel for wifi stuff (cutting it down to 1 alloc from four)
>>> help, massively on this issue, but until today I was unaware of how
>>> much the field may have been patching things out.
>>>
>>> The default 32MB memory limits in fq_codel comes from the stressing
>>> about 10GigE networking from google. 4MB is limit in openwrt,
>>> which is suitable for ~1Gbit, and is sort of there  due to 802.11ac's
>>> maximum (impossible to hit) of a txop that large.
> I did kind of conflate "qos + fq_codel" vs wifi in this message. It
> looks like yer staying with me.
>
>>> Something as small as 256K is essentially about 128 full size packets
>>> (and often, acks from an ethernet device's rx ring eat 2k).
>> what i miss in mac80211 is the following option "fq_codel = off"
>> its essential and i will definitly work on a patch to deal with this
>> way for low memory 802.11n platforms.
> Well, it would be my hope that turning it off would A) not help that
> much on memory or cpu and B) show such a dramatic reduction in
> multi-station performance that you'd immediately turn it on again.
isnt it better to have a working platform with less performance than a 
crashing platform with no performance?
i mean i can user older mac80211 versions without that issue on a 
typical nanostation 2/5 which is often used just as CPE device

but with current mac80211 versions (current means last 2-3 years). they 
are just unstable and running out of memory after a while
the only thing which helped was cutting of the memory limit of fq_codel 
inside mac80211
i also have another fancy testunit which is a linksys wrt400 with 32 mb 
ram and 2 ath9k based wifi chipsets. no hope here for running stable
for only 5 minutes even with a single connection under load (my crashing 
test is running a hdtv iptv stream converted to unicast using a 
stateless eoip tunnel)

> I try to encourage folk to run the rtt_fair tests in flent when
> twiddling with wifi. Those really shows how bad things are when you
> don't have ATF + FQ + Per station aggregation and lots of
> clients. Single threaded tests are misleading.
i know but even single threaded tests arent working good on such 
devices. so there is no need to talk about the benefits of atf,fq_codel etc.
but there is need to talk about configurable use of it which also allows 
to disable it if required. if you just have a cpe device with pppoe 
running on it which is common for wisps
there is no need for much fair queuing. this is a task for the 
accesspoint. another typical use for such devices like nanostation, 
rocket, bullet etc. are simple point to point long range links.
this is the main use for such high gain devices like these is my 
assumption.
so we dont talk about a typical cool and fancy ab. we talk about 
compatibility with low end devices without running out of resources. i'm 
a typical programmer from the 80s. keep it small, simple and resource 
efficient as possible. these coding standards should still be considered 
today even if i dont write tetris clones anymore running on 512 byte 
boot sectors using the msdos builtin debug assembler program
>
> I gave a good demo of why this is (was!), here: https://www.youtube.com/watch?v=Rb-UnHDw02o&t=1551s
>
> and there's more in the ending the anomaly paper. Perversely though,
> now that we can do 25x latency reductions and 2.5x more throughput,
> more memory is needed to achieve those goals in some cases, which
> is part of my concern about chopping things down to 256k here.
>>> The structure of the new fq_codel for wifi subsystem is "one in the
>>> hardware, one ready to go, and the rest accumulating". I
>>> typically see about 13-20 packets in an aggregate. 256k strikes me as
>>> a bit small.
>> from the rules its that 256 is used for ht only and if vht is involved
>> the limit of 4mb is used.
>> but now comes the point. all 802.11ac platforms having 64mb ram or
>> more. but ath10k chipsets are using
>> about 40 mb of shared memory. so mmh we are hitting the wall
>> again. most routers have 128 mb with 802.11ac, but some (noticable
>> dlink) have just 64mb
> Ugh.
>
> Is it just the mips boxes with so little ram? All the arm routers I have
> have at least 128, some as much as 512.
you got it. all the mips routers. most problematic the tplink wr841 (and 
similar series) and ubnt devices of course.
these are 802.11 but just comming with 32 mb ram. but there are others 
too of course and i love to maintain all the older devices
for the community. for newer arm based devices we really dont need to 
care about. broadcom arm cpus are comming with chipsets which are not 
supported by linux/mac80211 anyway
or just bad supported for newer chipsets using brcmfmac. (but the 
original broadcom propertiery driver is unstable too of course)
and all other models based on qca ipq8064 etc. are comming with 256 mb 
and more and we really only need to take care about ath9k and ath10k 
(soon maybe ath11k)
everything else doesnt matter. the linksys wrtXXXX series has a mac80211 
driver, but marvell stopped maintaining it at a point where it still was 
shit and unstable. and its mainly based on a binary firmware blob.

>
> Yes, having a wifi chip that can theoretically have 4MB in transit
> with so little ram is problematic.
>
> Dear dlink: don't do that. It hurts when you do that.
>
i talked alot with dlink about this issue, but dlinks solution was just 
switching to a cheaper mediatek mips based platform. now we have more 
ram, but a featureless chipset.
same for tplink.
>>> I haven't checked, but does this patch still exist in openwrt/dd-wrt?
>>> It had helped a lot when under memory pressure from
>>> a lot of small packets.
>>>
>>> https://github.com/dtaht/cerowrt-3.10/blob/master/target/linux/generic/patches-3.10/657-qdisc_reduce_truesize.patch
>>>
>>> Arguably this could be made more aggressive, but it massively reduced
>>> memory burdens at the time I did it when
>>> flooding the device, or having lots of acks, and while it cost cpu it
>>> saved on ooming.
>> mmh let me check -> nope its at least not in my tree. but will be soon :-)
> Well, I sent along a mildly improved version of the idea.
>
> I can really see some sort of "test my qos" script that attempts
> to flood every queue on the system. And wider adoption of
> cake which is lighter weight than the alterntives.
>
> one idea that's in cake was that: we'd hoped to capture the most typical
> qos setups with it with "models". It's very easy to add a new model
> (besteffort, diffserv3, diffserv4) (it's a lookup table and bandwidth
> allocation call), but lacking feedback on more typical QoS constructs
> from the field, that's where it ended. When we started the project,
> I figured we'd end up with 20+ models before the end.
>
> It would be good to get a tc class dump or output from more typical
> QoS Setups.
>
> In sqm and cake...
> we have a terrible tendency to tell people "no, just use the defaults!
> they work! trust us!"...
yeah i know that feeling .but i can never trust the users. the always do 
what they think is good for them
and everyone thinks he knows better since he was reading something using 
google / reddit
>
> who generally don't believe us and want to keep doing things the
> way they always have.
>
> In more than a few circumstances they are right, but we don't understand
> what they are trying to do.
>
> As one case that cake doesn't handle, at least some iptv setups are
> visible as a strict priority queue over everything else, below which you
> do everything else, so the tv stream never, ever, drops a packet.
as i mentioned before. my solition for iptv is layer 2 tunneling to get 
rid of multicast issues and it also converts everthing to a single 
connection.
i use a rfc compliant ether over ip tunnel for this which is not 
upstream in linux, but in freebsd. but there was a driver for kernel 2.4 
around many years ago and i maintained it up
to the latest kernel. its robust, handles fragmentation and just has 12 
bytes overhead.
>
> We didn't do that, but could *easily* add an iptv model to shape
> inbound better - if we knew more about how free, FT etc, construct
> their packets.

inbound they are marked with tos. typical internet has 0 of course. iptv 
has X and voice has Y. (dont ask me for the numbers, i dont have them in 
mind right now)
but for dhcp leases you need to mark your own packets with another dscp. 
otherwise the isp returns no ip. i dont know why this has been made. but 
it has to be handled.
normally orange ships black boxes as routers and to get it working with 
free systems, some people reverse engineered that shit. my conclusion is 
its some sort of
obfuscation to avoid third party hardware since the EU regulated the 
ISP's in a way that they got forced to allow 3rd party products which 
they still try to avoid. (refusing support for internet problems etc.)

> Similarly some folk in this world want strict priority for EF.
>
>>> There's two other dubious things in the fq_codel for wifi stack
>>> presently. Right now the codel target is set too high for p2p use
>>> (20ms, where 6ms seems more right), and it also flips up to a really
>>> high target and interval AND turns off ecn when there's more than a
>>> few stations available (rather than "active") - it's an overly
>>> conservative figure we used back when we had major issues with
>>> powersave
>>> and multicast that I'd hoped we could cut back to normal after we got
>>> another round of research funding and feedback from the field (which
>>> didn't happen, and we never got around to making it configurable, and
>>> being 25x better than it was before seemed "enough")
>>>
>>> I was puzzled at battlemesh as to why I had dropping at about 50ms
>>> delay rather than ecn, and thought it was something
>>> else, and this morning I'm thinking that folk have been reducing the
>>> memlimit to 256k rather....
>>>