From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ob0-x22c.google.com (mail-ob0-x22c.google.com [IPv6:2607:f8b0:4003:c01::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 151253B260; Tue, 3 May 2016 01:21:35 -0400 (EDT) Received: by mail-ob0-x22c.google.com with SMTP id j9so3310407obd.3; Mon, 02 May 2016 22:21:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-transfer-encoding; bh=6M7y5utHtm9mjqtKOxSUZrh+aEVzUpvjNcaHzusowsw=; b=mtmjcqIYWuNbtqXNK76YyT//MYcpZ/+7vVuK/ng+up+jFBAuAflb1fKvrqa6HOcLf3 xd8eLTEQzWuZes7vyAbCulXy+WTrAbmLD2p0xtX2qY0Is/FSYE5tkQOnqyAWJnBIrj4C 6ej7bENiBYSAl4LtOCLAApGU2pzG1B0Sk0sJ3biqXzWeghDWwx0iFqssfiNCmMovuSM8 fqxtzkx4MkZyJVRiLDZKBCcEddVS3f2vuXCPSZ2b72Zs9weHWvrWZrBRKVfRLCOtRY5i qSUUW72GRtUW8fuh0nTc1CdsNQAHetnCjEbZXzkVIbHQRdQwgGalhgN0JqKqhNhnxmT9 PokA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-transfer-encoding; bh=6M7y5utHtm9mjqtKOxSUZrh+aEVzUpvjNcaHzusowsw=; b=ZGo3hA8bL1MzF/5RLP5UQCautlxKUNe0lLsDexQGY+Fz7mjjQIyRC4kbpVIEcef/1k 51vyYCfGZ7FjpNKJoIDppceDoY5X/MvuSAsTYXp2DITUvrlToYcVTTdAE87o3FVPVxhm neg0lwPrHtS6l4tzEwZ1XU/jnwZSE8lmeMDzrrIDJZmi7LoW+z/wLlWfP2YqkUWKK7ag 4tL7DfTNo8765S+r/wzGAVh+O57kkwf0R6HEbhfqG3nl93FlF10ixHerTMDIf5EM+5H1 azU8VR/jRxxcz6RygQ5U2dD+re+Z9JWdye5x2+kM8CrAra5Wzlf1wzLfDeFZH99mFCU6 OXmg== X-Gm-Message-State: AOPr4FXsUQ1E22C05EYN6UhXCG1oyhU0XXdMJmCw88B28n+3SQrceOnXh80DU44UfFsBj41374KKPJr6ZNpLvQ== MIME-Version: 1.0 X-Received: by 10.182.144.102 with SMTP id sl6mr187785obb.25.1462252894424; Mon, 02 May 2016 22:21:34 -0700 (PDT) Received: by 10.202.78.23 with HTTP; Mon, 2 May 2016 22:21:34 -0700 (PDT) In-Reply-To: References: <1462125592.5535.194.camel@edumazet-glaptop3.roam.corp.google.com> <865DA393-262D-40B6-A9D3-1B978CD5F6C6@gmail.com> Date: Mon, 2 May 2016 22:21:34 -0700 Message-ID: From: Dave Taht To: Jonathan Morton Cc: Eric Dumazet , make-wifi-fast@lists.bufferbloat.net, "codel@lists.bufferbloat.net" , ath10k Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Make-wifi-fast] [Codel] fq_codel_drop vs a udp flood X-BeenThere: make-wifi-fast@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 03 May 2016 05:21:35 -0000 On Mon, May 2, 2016 at 7:26 PM, Dave Taht wrote: > On Sun, May 1, 2016 at 11:20 AM, Jonathan Morton = wrote: >> >>> On 1 May, 2016, at 20:59, Eric Dumazet wrote: >>> >>> fq_codel_drop() could drop _all_ packets of the fat flow, instead of a >>> single one. >> >> Unfortunately, that could have bad consequences if the =E2=80=9Cfat flow= =E2=80=9D happens to be a TCP in slow-start on a long-RTT path. Such a flo= w is responsive, but on an order-magnitude longer timescale than may have b= een configured as optimum. >> >> The real problem is that fq_codel_drop() performs the same (excessive) a= mount of work to cope with a single unresponsive flow as it would for a tru= e DDoS. Optimising the search function is sufficient. > > Don't think so. > > I did some tests today, (not the fq_codel batch drop patch yet) > > When hit with a 900mbit flood, cake shaping down to 250mbit, results > in nearly 100% cpu use in the ksoftirq1 thread on the apu2, and > 150mbits of actual throughput (as measured by iperf3, which is now a > measurement I don't trust) > > cake *does* hold the packet count down a lot better than fq_codel does. > > fq_codel (pre eric's patch) basically goes to the configured limit and > stays there. > > In both cases I will eventually get an error like this (in my babel > routed environment) that suggests that we're also not delivering > packets from other flows (arp?) with either fq_codel or cake in these > extreme conditions. > > iperf3 -c 172.26.64.200 -u -b900Mbit -t 600 > > [ 4] 47.00-48.00 sec 107 MBytes 895 Mbits/sec 13659 > iperf3: error - unable to write to stream socket: No route to host > > ... > > The results I get from iperf are a bit puzzling over the interval it > samples at - this is from a 100Mbit test (downshifting from 900mbit) > > [ 15] 25.00-26.00 sec 152 KBytes 1.25 Mbits/sec 0.998 ms > 29673/29692 (1e+02%) > [ 15] 26.00-27.00 sec 232 KBytes 1.90 Mbits/sec 1.207 ms > 10235/10264 (1e+02%) > [ 15] 27.00-28.00 sec 72.0 KBytes 590 Kbits/sec 1.098 ms > 19035/19044 (1e+02%) > [ 15] 28.00-29.00 sec 0.00 Bytes 0.00 bits/sec 1.098 ms 0/0 (-nan%) > [ 15] 29.00-30.00 sec 72.0 KBytes 590 Kbits/sec 1.044 ms > 22468/22477 (1e+02%) > [ 15] 30.00-31.00 sec 64.0 KBytes 524 Kbits/sec 1.060 ms > 13078/13086 (1e+02%) > [ 15] 31.00-32.00 sec 0.00 Bytes 0.00 bits/sec 1.060 ms 0/0 (-nan%) > ^C[ 15] 32.00-32.66 sec 64.0 KBytes 797 Kbits/sec 1.050 ms > 25420/25428 (1e+02%) OK, the above weirdness in calculating a "rate" is due to me sending 8k fragmented packets. -l1470 fixed that. > Not that I care all that much about how iperf is intepreting it's drop > rate (I guess pulling apart the actual caps is in order). > > As for cake struggling to cope: > > root@apu2:/home/d/git/tc-adv/tc# ./tc -s qdisc show dev enp2s0 > > qdisc cake 8018: root refcnt 9 bandwidth 100Mbit diffserv4 flows rtt 100.= 0ms raw > Sent 219736818 bytes 157121 pkt (dropped 989289, overlimits 1152272 requ= eues 0) > backlog 449646b 319p requeues 0 > memory used: 2658432b of 5000000b > capacity estimate: 100Mbit > Bulk Best Effort Video Voice > thresh 100Mbit 93750Kbit 75Mbit 25Mbit > target 5.0ms 5.0ms 5.0ms 5.0ms > interval 100.0ms 100.0ms 100.0ms 100.0ms > pk_delay 0us 5.2ms 92us 48us > av_delay 0us 5.1ms 4us 2us > sp_delay 0us 5.0ms 4us 2us > pkts 0 1146649 31 49 > bytes 0 1607004053 2258 8779 > way_inds 0 0 0 0 > way_miss 0 15 2 1 > way_cols 0 0 0 0 > drops 0 989289 0 0 > marks 0 0 0 0 > sp_flows 0 0 0 0 > bk_flows 0 1 0 0 > last_len 0 1514 66 138 > max_len 0 1514 110 487 > > ... > > But I am very puzzled as to why flow isolation would fail in the face > of this overload. And to simplify matters I got rid of the advanced qdiscs entirely, switched back to htb+pfifo and get the same ultimate result of the test aborting... Joy. OK, ethtool -s enp2s0 advertise 0x008 # 100mbit Feeding packets in at 900mbit into a 1000 packet fifo queue at 100Mbit is predictably horriffic... other flows get starved entirely, you can't even type on the thing, and still eventually [ 28] 28.00-29.00 sec 11.4 MBytes 95.7 Mbits/sec 0.120 ms 72598/80726 (90%) [ 28] 29.00-30.00 sec 11.4 MBytes 95.7 Mbits/sec 0.119 ms 46187/54314 (85%) [ 28] 189.00-190.00 sec 8.73 MBytes 73.2 Mbits/sec 0.162 ms 55276/61493 (90%) [ 28] 190.00-191.00 sec 0.00 Bytes 0.00 bits/sec 0.162 ms 0/0 (-nan%) vs: [ 4] 188.00-189.00 sec 105 MBytes 879 Mbits/sec 74614 iperf3: error - unable to write to stream socket: No route to host Yea! More people should do that to themselves. System is bloody useless with a 1000 packet full queue and way more useful with fq_codel in this scenario... but still this ping should be surviving with fq_codel going and one full rate udp flood, if it wasn't for all the cpu being used up throwing away packets. I think. 64 bytes from 172.26.64.200: icmp_seq=3D50 ttl=3D63 time=3D6.92 ms 64 bytes from 172.26.64.200: icmp_seq=3D52 ttl=3D63 time=3D7.15 ms 64 bytes from 172.26.64.200: icmp_seq=3D53 ttl=3D63 time=3D7.11 ms 64 bytes from 172.26.64.200: icmp_seq=3D55 ttl=3D63 time=3D6.68 ms ping: sendmsg: No route to host ping: sendmsg: No route to host ping: sendmsg: No route to host ... OK, tomorrow, eric's new patch! A new, brighter day now that I've burned this one melting 3 boxes into the ground. and perf. --=20 Dave T=C3=A4ht Let's go make home routers and wifi faster! With better software! http://blog.cerowrt.org