From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mout.gmx.net (mout.gmx.net [212.227.15.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 1B7873B29E; Fri, 23 Aug 2019 02:48:24 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1566542902; bh=u4Oq1NdA3NpJRT459l0X5vadcXrK9dTI0RdILdoX+6I=; h=X-UI-Sender-Class:Subject:From:In-Reply-To:Date:Cc:References:To; b=TWMUHCnAlkS+UZOke9Xyeo0Jah1Ad2wVjPe2NNVCYW8rRDNjEo7pYiQvxy9/rWdZh 8qveqnLoMpjDo13316vTZWP91ICsVM+sJADx1QCbwzIbH/TmE0R0V8fOIoUpzVy6xq JxkJXkLUpN7Z0WiV89r19eoo+siLzIpy0uIpVUS0= X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c Received: from [10.11.12.32] ([134.76.241.253]) by mail.gmx.com (mrgmx002 [212.227.17.190]) with ESMTPSA (Nemesis) id 0LqQzp-1iVSR508A5-00e92J; Fri, 23 Aug 2019 08:48:22 +0200 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\)) From: Sebastian Moeller In-Reply-To: <877e74epnn.fsf@taht.net> Date: Fri, 23 Aug 2019 08:48:20 +0200 Cc: Sebastian Gottschall , Cake List , Make-Wifi-fast Content-Transfer-Encoding: quoted-printable Message-Id: <3A2A8915-5A5E-42E8-BA51-50D44AD852B0@gmx.de> References: <54438C64-C613-438E-9CB9-6C6D0C5EAFA0@gmail.com> <87sgpvflo4.fsf@taht.net> <87wof6rf7t.fsf@toke.dk> <7656FCDE-C590-4B0C-B191-B9FAC928A762@gmail.com> <5eb4c395-c718-2d28-65a7-9762cf8d5bea@newmedia-net.de> <47AD5102-B66F-44A5-AADE-D167ECB94A61@gmx.de> <1d772664-b6cc-a528-9725-96a431032875@newmedia-net.de> <87v9uqea3x.fsf@taht.net> <87tvaap57q.fsf@toke.dk> <5bbd2b81-9846-3a7a-130c-0f59e04fd2d1@newmedia-net.de> <87ftltdter.fsf@taht.net> <87pnkxnjo4.fsf@toke.dk> <981dd67a-7fb8-1e6a-3e50-6f63a414f1a1@newmedia-net.de> <877e74epnn.fsf@taht.net> To: Dave Taht X-Mailer: Apple Mail (2.3445.104.11) X-Provags-ID: V03:K1:iyhdxwgLixANPPC2K2NneSB3PcorMiAVzZAjUkpP1ayKoT/Sa7S gublCNIPfTQFdIyvNpjP6wfyFveOpP35UwXtLps9ml1lSwNefostkuM2OgMoUCnLqRQNhFF alnwLhvUzdY/TrwP3L3WEuIegE95f2Liy1dsuIQheaZc/6ZdVmzks1KdTihBzid/RJuVytU fjifv8Qxt7MR/4Q9jb5Bw== X-Spam-Flag: NO X-UI-Out-Filterresults: notjunk:1;V03:K0:ZCnjNBk7ODo=:ZXVn6y9qF0ahl3V4Lf0ruf YHihdHbN1oS5F830Rfj30pdDEElmk5c/mN9f5ULMPaB/OxQ5Lu+SAOMgh3lhksO74aqpaatXi 2ljBieuTtu2splqJBZiugL6h4MI2F0HnHxvgaDnl00ImEoI4CYQd5LGiIuvS30yGyqziqXy5v OrQQVf35vYJpURTFs5JcWxKfSMrI5Sr2+2xELqbhGmFEtuL+wNAsYpHCM6spu+b8N34Hf6qM9 fizEVXAv5/ufTYzSBggbTy/mwmdRqKE6ofMOEvcPtmLNjLgx6O03/Flf18I+qy/Ov+e+XBD8W 1yjjddFaTpT3mFGxgIhPip5dJt+u6Vz4rdYer/Ycz/MLsKKs3qtnfzVKpSZdTJZJ0mbkZal7x pSFK49ErlPUz/KzcR8DNsGnRKdo51qlpskTfxT3atklqmiiHae1nptJFOoHSDVXrMW1D+4N9I XSvjFRYsrjql5QFQw4oiIc5wbNQxHlk3C+GTo1PxJzNIi1AXJWJiZIf4X73YK8oRZ7qzOxso2 cBZWxmcphJfjXmEOTHFOp+OGNidvYVnJucQepyOfkfY4l2a9ZvQR9Gr7XO4joZrvIyzcM6D47 AlDkWJD6lGnHzwMpEXc2SAwB0ZfkOnfJLb/mp+rTEjrBo7M6bdnnkhFF87jidbMU8ddLRWNh+ 0V8F1e4+fNB01kHmzQW1Lo1UgoaYVdH1X57cIe9qhwMW1+x8pPUKfridDP1E7Dgr/tN4r002V o11CB7KkDd1J2f5aMimS55DvQHNWfBT9c0qfkuoo33UWZq08ziD6DgrnHeTiQZQwR6mujIapQ NR29hgakFDG6kjaVMNusdEYwzxpisU+KYDF00YKtskqWcLGj+cyTPENVUbBOFnppYN94rnrXL jg0H2K61zvmJ0zwqEQgAbZgz/aAQfyzJp5dDYhMS0/F3xv7gHdY1mqqTdKfTL7v+F93kxB84T FK0XlsAFzElePhHZxJTwXtYIhsNyXAyBA16dSA7V79VXw4QfJZ2Gwj6vCPX3+GLbd5rkfJn0u OW5tNHwFvf3bkBnisrMS+YYzk/Ep6Z5qm5FgTZpwXuAptftKF+KY+zsyoBbikipkcs7+mQ6RO UnpwUq880BWqjwyI9SZ52WzE7UE1Yzb/Fh5wIWZ9+xiOF8VbWhb86j5A3T3wMzXUNHNc6bE5h evbIg= Subject: Re: [Cake] Wifi Memory limits in small platforms X-BeenThere: cake@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Cake - FQ_codel the next generation List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Aug 2019 06:48:24 -0000 > On Aug 23, 2019, at 01:39, Dave Taht wrote: >=20 > Sebastian Gottschall writes: >=20 >>>>> but with current mac80211 versions (current means last 2-3 years). = they >>>>> are just unstable and running out of memory after a while >>>>> the only thing which helped was cutting of the memory limit of = fq_codel >>>>> inside mac80211 >>>>> i also have another fancy testunit which is a linksys wrt400 with = 32 mb >>>>> ram and 2 ath9k based wifi chipsets. no hope here fonr running = stable >>>>> for only 5 minutes even with a single connection under load (my = crashing >>>>> test is running a hdtv iptv stream converted to unicast using a >>>>> stateless eoip tunnel) >>>>>=20 >>>>>> I try to encourage folk to run the rtt_fair tests in flent when >>>>>> twiddling with wifi. Those really shows how bad things are when = you >>>>>> don't have ATF + FQ + Per station aggregation and lots of >>>>>> clients. Single threaded tests are misleading. >>>>> i know but even single threaded tests arent working good on such >>>>> devices. so there is no need to talk about the benefits of = atf,fq_codel etc. >>>>> but there is need to talk about configurable use of it which also = allows >>>>> to disable it if required. >>> I 110% agree that a system that can stay up for years is much better >>> than one that is fast for 5 minutes! >>>=20 >>> However I'd like a chance, in collaborating with you and your = upcoming >>> patches - to try and narrow >>> down crash bugs to various subsystems and be able to get some >>> benchmarks done that I simply >>> couldn't do anymore at the financial conclusion of the = make-wifi-fast >>> and cake projects. >>>=20 >>> I think I have a lot of gear that is dd-wrt compatible - apu2, >>> wndr3700s, 3800s.... >> if its v4, these are having 128 mb (i have them too). >=20 > These are from the cerowrt era, so, 32 or 64MB of ram. I believe we only used wndr3700v2 (64MB) and wndr3800 (128MB), = at least those were the recommended ones. I also remember making these = OOM with a simple UDP flood with randomized port addresses quite easily = intially. That is, until we used fq_codel's limit keyword to restrict = the number of maximally queued packets. This experience also carried = into cake and culminated into the memlimit keyword. It seems I = completely missed the addition of the "memory_limit BYTES" keyword to = fq_codel, which seems a better fit to our needs than the "limit 1001" we = currently use (why 1001 instead of 1000, simply to be able to quickly = see whether this is our limit or something the user used, pleople ted to = leave the last digit alone when playing with these parameters ;)). I guess I have not bothered to repeat that test since fq_codel became = the default qdisc in OpenWrt... Best Regards Sebastian >=20 >> and apu2 has 2 >> gb. so its getting real interesting >> if you choose such a bad one with 32 mb ram which are still commonly >> used by "freifunk" >=20 > One thing we can start doing more 'round here is to boot the x86 boxes > with mem=3D32MB or something similar (40% larger due to 64 bits? no = idea, > maybe look at free mem on a similar config) to see what shows up.=20 >=20 > For example, one of my APU2s has dual ath9/ath10k cards which is a > a reasonable sim of one of your configs.=20 >=20 >>> The reduce truesize patch had helped a lot at the time (2012). There >>> were all kinds of flaky bugs that disappeared. >> i tested and it helped to make ethernet unavailable. it worked for >=20 > thx for making me chortle in sad empathy. >=20 >> wifi interfaces. but the eth0 and eth1 on my ipq8064 based >> testboard did not work anymore. no dhcp lease, no ping. but i was = able >> to capture inbound packets. (qos was not even enabled while testing, >> so no cake, fq_code letc. just standard sfq scheduler) >> so i reverted and all worked again >=20 > OK. Thx for trying. there have been so many bugs in gso/gro and = hardware > offloads that I figure that that's why the patch was dropped over = time. >=20 > is cake's gso-splitting working on that same hardware? I'm not sure > to what extent that reduces packet size or not these days. >=20 > I'll try that again on x86, maybe it needed to pullskb.... >=20 >>>=20 >>> the new drop monitor patchset looks WONDERFUL for seeing more about >>> packet drop behavior in the stack, but >>> it's a 5.3(?) feature only. >> i love backporting :-) >=20 > I used to but these days I'm content to work out of net-next x.y.0-rc4 > or later. I get more sleep that way. Oh, wait, it just hit that.... >=20 >>>=20 >>> I note that I run 18.06.1 on my 32MB pico and nanostations on the >>> lupin campus, but I run no gui, few additional applications at all >>> (except babel, snmpd, netperf, and the other core needed daemons). = My >>> uptimes are principally governed by power failures. I can't remember >>> the last "crash, crash" I had, and I do track memory leaks (none). >>> That said, I'm painfully aware that I should probably give dd-wrt = and >>> openwrt 19.x some testing just to make sure there's no regressions, >>> but have been reluctant to get involved again without more partners = in >>> crime, because the scars from deploying 18.x widely are only = beginning >>> to heal... and only last week did the needed babel 1.9 upgrade = arrive >>> so I can finally redeploy ipv6 universally. I fear my current >>> reliability metrics are so good because I took down ipv6 last = year.... >> my workaround with memory problems is also disabling http normally. i >> have some of these nanostations in the field >>=20 >> just running hostapd, snmp, syslog. but anything else is disabled due >> the oom problematics. it never was a real crash. >>=20 >> but oom. but i never played with babel. ospf etc. all working out of >> the box based on quagga on low end devices and frr on bigger ones. >>=20 >>>=20 >>> Pico: >>>=20 >>> root@pool2:~# free >>> total used free shared = buffers >>> Mem: 28480 23796 4684 92 = 1868 >>> -/+ buffers: 21928 6552 >>> Swap: 0 0 0 >>>=20 >>> root@pool2:~# uptime >>> 11:38:09 up 43 days, 21:37, load average: 0.04, 0.03, 0.04 >>>=20 >>> Same workload over here, on a wndr3800, almost exactly the same = config >>>=20 >>> root@couch:~# free >>> total used free shared buffers = cached >>> Mem: 60320 22872 37448 68 1960 = 6120 >>> -/+ buffers/cache: 14792 45528 >>> Swap: 0 0 0 >>=20 >> NS2 >>=20 >> root@TRO1:~# free >>=20 >> total used free shared buff/cache =20= >> available >> Mem: 29124 19228 3552 0 6344 = 7752 >> Swap: 0 0 0 >=20 > It looks like you are running even less stuff than I am. And this > machine is running with 256k bufs? >=20 >> wndr3700v4 >>=20 >> root@DD-WRT:~# free >> total used free shared buff/cache =20= >> available >> Mem: 125884 23048 92940 0 9896 = 99824 >> Swap: 0 0 0 >> root@DD-WRT:~# >>=20 >>=20 >>>=20 >>>> Disabling the fq part won't actually gain you much in terms of = memory >>>> usage, though, as most of it is packet memory which is already >>>> configurable. >>>>=20 >>>> The one exception to this is the static overhead of 'struct = fq_flow', of >>>> which mac80211 currently allocates 4k. That's 300k of memory which = is >>>> currently not configurable. But that could be fixed :) >>>>=20 >>>> -Toke >>> -- >>>=20 >>> Dave T=C3=A4ht >>> CTO, TekLibre, LLC >>> http://www.teklibre.com >>> Tel: 1-831-205-9740 >>>=20 > _______________________________________________ > Cake mailing list > Cake@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/cake