From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.taht.net (mail.taht.net [IPv6:2a01:7e00::f03c:91ff:feae:7028]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 00EF93B29E; Thu, 22 Aug 2019 19:39:22 -0400 (EDT) Received: from dancer.taht.net (unknown [IPv6:2603:3024:1536:86f0:eea8:6bff:fefe:9a2]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.taht.net (Postfix) with ESMTPSA id 8F90821425; Thu, 22 Aug 2019 23:39:20 +0000 (UTC) From: Dave Taht To: Sebastian Gottschall Cc: Dave Taht , Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8r?= =?utf-8?Q?gensen?= , Cake List , Battle of the Mesh Mailing List , Make-Wifi-fast References: <54438C64-C613-438E-9CB9-6C6D0C5EAFA0@gmail.com> <87sgpvflo4.fsf@taht.net> <87wof6rf7t.fsf@toke.dk> <7656FCDE-C590-4B0C-B191-B9FAC928A762@gmail.com> <5eb4c395-c718-2d28-65a7-9762cf8d5bea@newmedia-net.de> <47AD5102-B66F-44A5-AADE-D167ECB94A61@gmx.de> <1d772664-b6cc-a528-9725-96a431032875@newmedia-net.de> <87v9uqea3x.fsf@taht.net> <87tvaap57q.fsf@toke.dk> <5bbd2b81-9846-3a7a-130c-0f59e04fd2d1@newmedia-net.de> <87ftltdter.fsf@taht.net> <87pnkxnjo4.fsf@toke.dk> <981dd67a-7fb8-1e6a-3e50-6f63a414f1a1@newmedia-net.de> Date: Thu, 22 Aug 2019 16:39:08 -0700 In-Reply-To: <981dd67a-7fb8-1e6a-3e50-6f63a414f1a1@newmedia-net.de> (Sebastian Gottschall's message of "Thu, 22 Aug 2019 22:30:45 +0200") Message-ID: <877e74epnn.fsf@taht.net> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Cake] Wifi Memory limits in small platforms X-BeenThere: cake@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Cake - FQ_codel the next generation List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 22 Aug 2019 23:39:23 -0000 Sebastian Gottschall writes: >>>> but with current mac80211 versions (current means last 2-3 years). they >>>> are just unstable and running out of memory after a while >>>> the only thing which helped was cutting of the memory limit of fq_codel >>>> inside mac80211 >>>> i also have another fancy testunit which is a linksys wrt400 with 32 mb >>>> ram and 2 ath9k based wifi chipsets. no hope here fonr running stable >>>> for only 5 minutes even with a single connection under load (my crashi= ng >>>> test is running a hdtv iptv stream converted to unicast using a >>>> stateless eoip tunnel) >>>> >>>>> I try to encourage folk to run the rtt_fair tests in flent when >>>>> twiddling with wifi. Those really shows how bad things are when you >>>>> don't have ATF + FQ + Per station aggregation and lots of >>>>> clients. Single threaded tests are misleading. >>>> i know but even single threaded tests arent working good on such >>>> devices. so there is no need to talk about the benefits of atf,fq_code= l etc. >>>> but there is need to talk about configurable use of it which also allo= ws >>>> to disable it if required. >> I 110% agree that a system that can stay up for years is much better >> than one that is fast for 5 minutes! >> >> However I'd like a chance, in collaborating with you and your upcoming >> patches - to try and narrow >> down crash bugs to various subsystems and be able to get some >> benchmarks done that I simply >> couldn't do anymore at the financial conclusion of the make-wifi-fast >> and cake projects. >> >> I think I have a lot of gear that is dd-wrt compatible - apu2, >> wndr3700s, 3800s.... > if its v4, these are having 128 mb (i have them too). These are from the cerowrt era, so, 32 or 64MB of ram. > and apu2 has 2 > gb. so its getting real interesting > if you choose such a bad one with 32 mb ram which are still commonly > used by "freifunk" One thing we can start doing more 'round here is to boot the x86 boxes with mem=3D32MB or something similar (40% larger due to 64 bits? no idea, maybe look at free mem on a similar config) to see what shows up.=20 For example, one of my APU2s has dual ath9/ath10k cards which is a a reasonable sim of one of your configs.=20 >> The reduce truesize patch had helped a lot at the time (2012). There >> were all kinds of flaky bugs that disappeared. > i tested and it helped to make ethernet unavailable. it worked for thx for making me chortle in sad empathy. > wifi interfaces. but the eth0 and eth1 on my ipq8064 based > testboard did not work anymore. no dhcp lease, no ping. but i was able > to capture inbound packets. (qos was not even enabled while testing, > so no cake, fq_code letc. just standard sfq scheduler) > so i reverted and all worked again OK. Thx for trying. there have been so many bugs in gso/gro and hardware offloads that I figure that that's why the patch was dropped over time. is cake's gso-splitting working on that same hardware? I'm not sure to what extent that reduces packet size or not these days. I'll try that again on x86, maybe it needed to pullskb.... >> >> the new drop monitor patchset looks WONDERFUL for seeing more about >> packet drop behavior in the stack, but >> it's a 5.3(?) feature only. > i love backporting :-) I used to but these days I'm content to work out of net-next x.y.0-rc4 or later. I get more sleep that way. Oh, wait, it just hit that.... >> >> I note that I run 18.06.1 on my 32MB pico and nanostations on the >> lupin campus, but I run no gui, few additional applications at all >> (except babel, snmpd, netperf, and the other core needed daemons). My >> uptimes are principally governed by power failures. I can't remember >> the last "crash, crash" I had, and I do track memory leaks (none). >> That said, I'm painfully aware that I should probably give dd-wrt and >> openwrt 19.x some testing just to make sure there's no regressions, >> but have been reluctant to get involved again without more partners in >> crime, because the scars from deploying 18.x widely are only beginning >> to heal... and only last week did the needed babel 1.9 upgrade arrive >> so I can finally redeploy ipv6 universally. I fear my current >> reliability metrics are so good because I took down ipv6 last year.... > my workaround with memory problems is also disabling http normally. i > have some of these nanostations in the field > > just running hostapd, snmp, syslog. but anything else is disabled due > the oom problematics. it never was a real crash. > > but oom. but i never played with babel. ospf etc. all working out of > the box based on quagga on low end devices and frr on bigger ones. > >> >> Pico: >> >> root@pool2:~# free >> total used free shared buffers >> Mem: 28480 23796 4684 92 1868 >> -/+ buffers: 21928 6552 >> Swap: 0 0 0 >> >> root@pool2:~# uptime >> 11:38:09 up 43 days, 21:37, load average: 0.04, 0.03, 0.04 >> >> Same workload over here, on a wndr3800, almost exactly the same config >> >> root@couch:~# free >> total used free shared buffers cach= ed >> Mem: 60320 22872 37448 68 1960 6120 >> -/+ buffers/cache: 14792 45528 >> Swap: 0 0 0 > > NS2 > > root@TRO1:~# free > > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 total=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 used=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 free=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 shared buff/= cache=C2=A0=C2=A0 > available > Mem:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 29124=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 19228=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= 3552=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0 6344=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 7752 > Swap:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0 It looks like you are running even less stuff than I am. And this machine is running with 256k bufs? > wndr3700v4 > > root@DD-WRT:~# free > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 total=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 used=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 free=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 shared buff/= cache=C2=A0=C2=A0 > available > Mem:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 125884=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 23048=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 92940=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0 9896=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 99824 > Swap:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0 > root@DD-WRT:~# > > >> >>> Disabling the fq part won't actually gain you much in terms of memory >>> usage, though, as most of it is packet memory which is already >>> configurable. >>> >>> The one exception to this is the static overhead of 'struct fq_flow', of >>> which mac80211 currently allocates 4k. That's 300k of memory which is >>> currently not configurable. But that could be fixed :) >>> >>> -Toke >> -- >> >> Dave T=C3=A4ht >> CTO, TekLibre, LLC >> http://www.teklibre.com >> Tel: 1-831-205-9740 >>