From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mailout-de.gmx.net (mailout-de.gmx.net [213.165.64.22]) by huchra.bufferbloat.net (Postfix) with SMTP id 5202821F0B4 for ; Wed, 15 Aug 2012 23:09:53 -0700 (PDT) Received: (qmail invoked by alias); 16 Aug 2012 06:09:52 -0000 Received: from 75-142-58-156.static.mtpk.ca.charter.com (EHLO dhcp-112.home.lan) [75.142.58.156] by mail.gmx.net (mp012) with SMTP; 16 Aug 2012 08:09:52 +0200 X-Authenticated: #24211782 X-Provags-ID: V01U2FsdGVkX18+Yy0pT0o+uzNNgXErGmEsDC8o1SysEHP4aDawdI lL/I+/4bAmDQoo Mime-Version: 1.0 (Apple Message framework v1278) Content-Type: text/plain; charset=iso-8859-1 From: Sebastian Moeller In-Reply-To: Date: Wed, 15 Aug 2012 23:09:47 -0700 Content-Transfer-Encoding: quoted-printable Message-Id: References: <36D61FDC-9AA9-46CC-ACBB-2D28B250C660@gmx.de> <1345071222.04317697@apps.rackspace.com> To: Dave Taht X-Mailer: Apple Mail (2.1278) X-Y-GMX-Trusted: 0 Cc: cerowrt-devel@lists.bufferbloat.net Subject: Re: [Cerowrt-devel] cerowrt 3.3.8-17 is released X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Aug 2012 06:09:53 -0000 Hi Dave, marvelous. On Aug 15, 2012, at 9:58 PM, Dave Taht wrote: > Firstly fq_codel will always stay very flat relative to your workload > for sparse streamss such as a ping or voip dns or gaming... >=20 > It's good stuff. >=20 > And, I think the source of your 2.8 second thing is fq_codel's current > reaction time, the non-responsiveness of the udp flooding netanylzer > uses > and huge default queue depth in openwrt's qos scripts. >=20 > Try this: >=20 > = cero1@snapon:~/src/Cerowrt-3.3.8/package/qos-scripts/files/usr/lib/qos$ > git diff tcrules.awk > diff --git a/package/qos-scripts/files/usr/lib/qos/tcrules.awk > b/package/qos-scripts/files/usr/lib/qos/tcrules > index a19b651..f3e0d3f 100644 > --- a/package/qos-scripts/files/usr/lib/qos/tcrules.awk > +++ b/package/qos-scripts/files/usr/lib/qos/tcrules.awk > @@ -79,7 +79,7 @@ END { > # leaf qdisc > avpkt =3D 1200 > for (i =3D 1; i <=3D n; i++) { > - print "tc qdisc add dev "device" parent 1:"class[i]"0 > handle "class[i]"00: fq_codel" > + print "tc qdisc add dev "device" parent 1:"class[i]"0 > handle "class[i]"00: fq_codel limit 1200 > } >=20 > # filter rule >=20 So openwrt's qos is still at the 10k packet limit for fq_codel? = That means worst case 14.3 MB queue (at 1500 byte packages), best case = 0.6103515625 MB (64byte packages), the worst case of which would take = around 3 seconds to drain, maybe that is my issue. I will immediately = try your patch. Done, now netalyzr reports 1100ms buffering down from = 2800ms (and no ath: skbuff alloc of size 1926 failed messages in dmesg, = but these did not show up during netalyzr runs). Now the other UDP = stress test now works much better (reporting around 1200ms uplink = buffering) producing no ath allocation failures. Switching to the hifgr = downlink version of the test gave me: [75755.714843] hostapd: page allocation failure: order:0, mode:0x4020 [75755.714843] Call Trace: [75755.714843] [<80287200>] dump_stack+0x8/0x34 [75755.714843] [<800b4e28>] warn_alloc_failed+0xe8/0x10c [75755.714843] [<800b712c>] __alloc_pages_nodemask+0x5a0/0x600 [75755.714843] [<800da950>] new_slab+0xa8/0x280 [75755.714843] [<80288c74>] = __slab_alloc.isra.60.constprop.63+0x25c/0x2fc [75755.714843] [<800db4f8>] kmem_cache_alloc+0x38/0xe0 [75755.714843] [<801d1b68>] ag71xx_fill_rx_buf+0x34/0xd8 [75755.714843] [<801d2458>] ag71xx_poll+0x464/0x5f4 [75755.714843] [<801ea3d0>] net_rx_action+0x88/0x1c8 [75755.714843] [<80077458>] __do_softirq+0xa0/0x154 [75755.714843] [<80077668>] do_softirq+0x48/0x68 [75755.714843] [<8007789c>] irq_exit+0x4c/0xb4 [75755.714843] [<80062f8c>] ret_from_irq+0x0/0x4 [75755.714843] [<801757a8>] lzma_main+0x9ec/0xbec [75755.714843] [<80175ef4>] xz_dec_lzma2_run+0x54c/0x824 [75755.714843] [<801744bc>] xz_dec_run+0x31c/0x8f4 [75755.714843] [<80132e74>] squashfs_xz_uncompress+0x164/0x274 [75755.714843] [<8012f368>] squashfs_read_data+0x4a8/0x660 [75755.714843] [<8012f6f4>] squashfs_cache_get+0x1d4/0x30c [75755.714843] [<80130be8>] squashfs_readpage+0x56c/0x804 [75755.714843] [<800ba130>] __do_page_cache_readahead+0x1b0/0x22c [75755.714843] [<800ba4b4>] ra_submit+0x28/0x34 [75755.714843] [<800b2e68>] filemap_fault+0x184/0x3cc [75755.714843] [<800c7fd4>] __do_fault+0xcc/0x450 [75755.714843] [<800cad5c>] handle_pte_fault+0x330/0x6d4 [75755.714843] [<800cb1b4>] handle_mm_fault+0xb4/0xe0 [75755.714843] [<8006c210>] do_page_fault+0x110/0x350 [75755.714843] [<80062f80>] ret_from_exception+0x0/0xc [75755.714843]=20 [75755.714843] Mem-Info: [75755.714843] Normal per-cpu: [75755.714843] CPU 0: hi: 18, btch: 3 usd: 5 [75755.714843] active_anon:1493 inactive_anon:2534 isolated_anon:0 [75755.714843] active_file:1623 inactive_file:1944 isolated_file:0 [75755.714843] unevictable:0 dirty:0 writeback:16 unstable:0 [75755.714843] free:95 slab_reclaimable:589 slab_unreclaimable:4876 [75755.714843] mapped:1030 shmem:25 pagetables:163 bounce:0 [75755.714843] Normal free:380kB min:1016kB low:1268kB high:1524kB = active_anon:5972kB inactive_anon:10136kB active_file:6492kB = inactive_file:7776kB unevictable:0kB isolated(anon):0kB = isolated(file):0kB present:65024kB mlocked:0kB dirty:0kB writeback:64kB = mapped:4120kB shmem:100kB slab_reclaimable:2356kB = slab_unreclaimable:19504kB kernel_stack:552kB pagetables:652kB = unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 = all_unreclaimable? no [75755.714843] lowmem_reserve[]: 0 0 [75755.714843] Normal: 57*4kB 19*8kB 0*16kB 0*32kB 0*64kB 0*128kB = 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB =3D 380kB [75755.714843] 4204 total pagecache pages [75755.714843] 611 pages in swap cache [75755.714843] Swap cache stats: add 1899, delete 1288, find 802/926 [75755.714843] Free swap =3D 973548kB [75755.714843] Total swap =3D 976560kB [75755.714843] 16384 pages RAM [75755.714843] 973 pages reserved [75755.714843] 4143 pages shared [75755.714843] 13118 pages non-shared [75755.714843] SLUB: Unable to allocate memory on node -1 (gfp=3D0x20) [75755.714843] cache: kmalloc-2048, object size: 2048, buffer size: = 2048, default order: 2, min order: 0 [75755.714843] node 0: slabs: 0, objs: 0, free: 0 [75755.718750] ge00: out of memory (I would have loved to try again, but that specific application = restricts e to 2 or 3 invocations per 24 hour periode which I already = used up; I really need to find another stress tester some of these = days). But bind survived intact. So thanks for the quick surgery on QOS that = surely improved things by a lot. Shall I try to request this change in = openWRT proper? I think that for most home routers allowing for >14MB = queues to build up in the device sure can cause havoc to stability (I = shudder while thinking about routers with 32 or even 16MB ram, and even = these could/should profit from codel; so my take is the limit needs to = be scaled with available memory wit a potential ceiling at 10k, :) ) Thanks again & best regards Sebastian >=20 > --=20 > Dave T=E4ht > http://www.bufferbloat.net/projects/cerowrt/wiki - "3.3.8-17 is out > with fq_codel!"