[Codel] fq_codel_drop vs a udp flood

Agarwal, Anil Anil.Agarwal at viasat.com
Tue May 3 08:50:23 EDT 2016


I should be more precise about the statement about the inaccuracy of the algorithm.
Given that we dequeue packets in round robin manner, the maxqidx value may, on occasions, point to a queue 
which is smaller than the largest queue by up to one MTU.

Anil

-----Original Message-----
From: Codel [mailto:codel-bounces at lists.bufferbloat.net] On Behalf Of Agarwal, Anil
Sent: Tuesday, May 03, 2016 8:40 AM
To: Dave Taht; Jonathan Morton
Cc: make-wifi-fast at lists.bufferbloat.net; codel at lists.bufferbloat.net; ath10k
Subject: Re: [Codel] fq_codel_drop vs a udp flood

Dave et al,



Here is another possible approach to improving the code performance when dropping packets.



Keep track of the queue with the largest number of packets, as you go, using an efficient algorithm.

Consequently, a search is not required when the occasion arises. 

There is a small amount of overhead for every packet enqueue and dequeue operation.

Here is some pseudo-code -



// Called after enqueuing a packet with updated queue length

static inline void

maxq_update_enq(q, idx, qlen)

{

    if (qlen > q->maxqlen) {

        q->maxqlen = qlen;

        q->maxqidx = idx;

    }

}



// Called after dequeuing a packet with updated queue length

static inline void

maxq_update_deq(q, idx, qlen)

{

    if (idx == q->maxqidx) {

        q->maxqlen = qlen;

    }

}



// Returns idx of the largest queue

static inline int

maxq_get_idx(q)

{

    return (q->maxqidx);

}



Given that we dequeue packets in round robin manner, the maxqidx value may sometimes be slightly inaccurate, perhaps pointing to the second largest queue on occasions.

The code will scale gracefully to handle larger number of queues and multiple unresponsive flows.



Please see if this makes sense. I have not gone through the fq_codel code in detail.

I had sent a similar suggestion to Rong Pan of the PIE group few months ago; not sure if they ever got to it.



Regards,

Anil



-----Original Message-----

From: Codel [mailto:codel-bounces at lists.bufferbloat.net] On Behalf Of Dave Taht

Sent: Tuesday, May 03, 2016 1:22 AM

To: Jonathan Morton

Cc: make-wifi-fast at lists.bufferbloat.net; codel at lists.bufferbloat.net; ath10k

Subject: Re: [Codel] fq_codel_drop vs a udp flood



On Mon, May 2, 2016 at 7:26 PM, Dave Taht <dave.taht at gmail.com> wrote:

> On Sun, May 1, 2016 at 11:20 AM, Jonathan Morton <chromatix99 at gmail.com> wrote:

>>

>>> On 1 May, 2016, at 20:59, Eric Dumazet <eric.dumazet at gmail.com> wrote:

>>>

>>> fq_codel_drop() could drop _all_ packets of the fat flow, instead of 

>>> a single one.

>>

>> Unfortunately, that could have bad consequences if the “fat flow” happens to be a TCP in slow-start on a long-RTT path.  Such a flow is responsive, but on an order-magnitude longer timescale than may have been configured as optimum.

>>

>> The real problem is that fq_codel_drop() performs the same (excessive) amount of work to cope with a single unresponsive flow as it would for a true DDoS.  Optimising the search function is sufficient.

>

> Don't think so.

>

> I did some tests today,  (not the fq_codel batch drop patch yet)

>

> When hit with a 900mbit flood, cake shaping down to 250mbit, results 

> in nearly 100% cpu use in the ksoftirq1 thread on the apu2, and 

> 150mbits of actual throughput (as measured by iperf3, which is now a 

> measurement I don't trust)

>

> cake *does* hold the packet count down a lot better than fq_codel does.

>

> fq_codel (pre eric's patch) basically goes to the configured limit and 

> stays there.

>

> In both cases I will eventually get an error like this (in my babel 

> routed environment) that suggests that we're also not delivering 

> packets from other flows (arp?) with either fq_codel or cake in these 

> extreme conditions.

>

> iperf3 -c 172.26.64.200 -u -b900Mbit -t 600

>

> [  4]  47.00-48.00  sec   107 MBytes   895 Mbits/sec  13659

> iperf3: error - unable to write to stream socket: No route to host

>

> ...

>

> The results I get from iperf are a bit puzzling over the interval it 

> samples at - this is from a 100Mbit test (downshifting from 900mbit)

>

> [ 15]  25.00-26.00  sec   152 KBytes  1.25 Mbits/sec  0.998 ms

> 29673/29692 (1e+02%)

> [ 15]  26.00-27.00  sec   232 KBytes  1.90 Mbits/sec  1.207 ms

> 10235/10264 (1e+02%)

> [ 15]  27.00-28.00  sec  72.0 KBytes   590 Kbits/sec  1.098 ms

> 19035/19044 (1e+02%)

> [ 15]  28.00-29.00  sec  0.00 Bytes  0.00 bits/sec  1.098 ms  0/0 (-nan%)

> [ 15]  29.00-30.00  sec  72.0 KBytes   590 Kbits/sec  1.044 ms

> 22468/22477 (1e+02%)

> [ 15]  30.00-31.00  sec  64.0 KBytes   524 Kbits/sec  1.060 ms

> 13078/13086 (1e+02%)

> [ 15]  31.00-32.00  sec  0.00 Bytes  0.00 bits/sec  1.060 ms  0/0 (-nan%)

> ^C[ 15]  32.00-32.66  sec  64.0 KBytes   797 Kbits/sec  1.050 ms

> 25420/25428 (1e+02%)



OK, the above weirdness in calculating a "rate" is due to me sending 8k fragmented packets.



-l1470 fixed that.



> Not that I care all that much about how iperf is intepreting it's drop





> rate (I guess pulling apart the actual caps is in order).

>

> As for cake struggling to cope:

>

> root at apu2:/home/d/git/tc-adv/tc# ./tc -s qdisc show dev enp2s0

>

> qdisc cake 8018: root refcnt 9 bandwidth 100Mbit diffserv4 flows rtt 

> 100.0ms raw  Sent 219736818 bytes 157121 pkt (dropped 989289, 

> overlimits 1152272 requeues 0)  backlog 449646b 319p requeues 0  

> memory used: 2658432b of 5000000b  capacity estimate: 100Mbit

>              Bulk    Best Effort     Video       Voice

>   thresh       100Mbit   93750Kbit      75Mbit      25Mbit

>   target         5.0ms       5.0ms       5.0ms       5.0ms

>   interval     100.0ms     100.0ms     100.0ms     100.0ms

>   pk_delay         0us       5.2ms        92us        48us

>   av_delay         0us       5.1ms         4us         2us

>   sp_delay         0us       5.0ms         4us         2us

>   pkts               0     1146649          31          49

>   bytes              0  1607004053        2258        8779

>   way_inds           0           0           0           0

>   way_miss           0          15           2           1

>   way_cols           0           0           0           0

>   drops              0      989289           0           0

>   marks              0           0           0           0

>   sp_flows           0           0           0           0

>   bk_flows           0           1           0           0

>   last_len           0        1514          66         138

>   max_len            0        1514         110         487

>

> ...

>

> But I am very puzzled as to why flow isolation would fail in the face 

> of this overload.



And to simplify matters I got rid of the advanced qdiscs entirely, switched back to htb+pfifo and get the same ultimate result of the test aborting...



Joy.



OK,



ethtool -s enp2s0 advertise 0x008 # 100mbit



Feeding packets in at 900mbit into a 1000 packet fifo queue at 100Mbit is predictably horriffic... other flows get starved entirely, you can't even type on the thing, and still eventually



[ 28]  28.00-29.00  sec  11.4 MBytes  95.7 Mbits/sec  0.120 ms

72598/80726 (90%)

[ 28]  29.00-30.00  sec  11.4 MBytes  95.7 Mbits/sec  0.119 ms

46187/54314 (85%)

[ 28] 189.00-190.00 sec  8.73 MBytes  73.2 Mbits/sec  0.162 ms

55276/61493 (90%)

[ 28] 190.00-191.00 sec  0.00 Bytes  0.00 bits/sec  0.162 ms  0/0 (-nan%)



vs:



[  4] 188.00-189.00 sec   105 MBytes   879 Mbits/sec  74614

iperf3: error - unable to write to stream socket: No route to host



Yea!  More people should do that to themselves. System is bloody useless with a 1000 packet full queue  and way more useful with fq_codel in this scenario...



but still this ping should be surviving with fq_codel going and one full rate udp flood, if it wasn't for all the cpu being used up throwing away packets. I think.



64 bytes from 172.26.64.200: icmp_seq=50 ttl=63 time=6.92 ms

64 bytes from 172.26.64.200: icmp_seq=52 ttl=63 time=7.15 ms

64 bytes from 172.26.64.200: icmp_seq=53 ttl=63 time=7.11 ms

64 bytes from 172.26.64.200: icmp_seq=55 ttl=63 time=6.68 ms

ping: sendmsg: No route to host

ping: sendmsg: No route to host

ping: sendmsg: No route to host



...



OK, tomorrow, eric's new patch! A new, brighter day now that I've burned this one melting 3 boxes into the ground. and perf.









--

Dave Täht

Let's go make home routers and wifi faster! With better software!

https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.cerowrt.org&d=CwIGaQ&c=jcv3orpCsv7C4ly8-ubDob57ycZ4jvhoYZNDBA06fPk&r=FyvaklKYrHaSCPjbBTdviWIW9uSbnxdNSheSGz1Jvq4&m=WA7M8kzfWtPc1BoysOKxcUO1fsm9bQlu_S3Voky3Hi0&s=xsNjZNPfz4WmfJZ4sP7jMTVJe140RgNczcwj6g5rU1g&e=

_______________________________________________

Codel mailing list

Codel at lists.bufferbloat.net

https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.bufferbloat.net_listinfo_codel&d=CwIGaQ&c=jcv3orpCsv7C4ly8-ubDob57ycZ4jvhoYZNDBA06fPk&r=FyvaklKYrHaSCPjbBTdviWIW9uSbnxdNSheSGz1Jvq4&m=WA7M8kzfWtPc1BoysOKxcUO1fsm9bQlu_S3Voky3Hi0&s=NTTN7_n6PYwoH6-tlPNWQ2qpYPCsFYiW8VWm3Ih1u5g&e= 

_______________________________________________
Codel mailing list
Codel at lists.bufferbloat.net
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.bufferbloat.net_listinfo_codel&d=CwIGaQ&c=jcv3orpCsv7C4ly8-ubDob57ycZ4jvhoYZNDBA06fPk&r=FyvaklKYrHaSCPjbBTdviWIW9uSbnxdNSheSGz1Jvq4&m=sh94VpjR-_N2jAHHqQbnik89iCiFw8Cv0ByrfywQYTo&s=oW_kvgDw9x-ftgF0ozE-JqiRuAm8blm7-22TuVMax2Y&e= 


More information about the Codel mailing list