Cake - FQ_codel the next generation
 help / color / mirror / Atom feed
* [Cake] BUG_ON vs WARN_ON
@ 2016-10-05 15:24 Kevin Darbyshire-Bryant
  2016-10-05 15:42 ` Jonathan Morton
  0 siblings, 1 reply; 15+ messages in thread
From: Kevin Darbyshire-Bryant @ 2016-10-05 15:24 UTC (permalink / raw)
  To: cake

Hi Jonathan,

How amenable are you to changing all 4 BUG_ON instances in cake to WARN_ON?

Linus isn't a complete fan and I'm thinking that producing a stack trace 
and trying to carry on is more helpful to a remote accessed, no serial 
interface type device than just killing the kernel dead.

Quite possibly other bad things(tm) will happen shortly after...or maybe 
there will be enough time look at dmesg for that stack trace.

Thoughts?

Kevin

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Cake] BUG_ON vs WARN_ON
  2016-10-05 15:24 [Cake] BUG_ON vs WARN_ON Kevin Darbyshire-Bryant
@ 2016-10-05 15:42 ` Jonathan Morton
  2016-10-05 15:45   ` Kevin Darbyshire-Bryant
  0 siblings, 1 reply; 15+ messages in thread
From: Jonathan Morton @ 2016-10-05 15:42 UTC (permalink / raw)
  To: Kevin Darbyshire-Bryant; +Cc: cake


> On 5 Oct, 2016, at 18:24, Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> wrote:
> 
> How amenable are you to changing all 4 BUG_ON instances in cake to WARN_ON?
> 
> Linus isn't a complete fan and I'm thinking that producing a stack trace and trying to carry on is more helpful to a remote accessed, no serial interface type device than just killing the kernel dead.
> 
> Quite possibly other bad things(tm) will happen shortly after...or maybe there will be enough time look at dmesg for that stack trace.

The two in cake_heap_swap() can probably go away completely - they were there to make sure the heap algorithms were working correctly.  They never did trigger, as it happens.

The other two are genuine serious bugs (array overflow) if they ever trigger.  It’s safer to leave them as-is.

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Cake] BUG_ON vs WARN_ON
  2016-10-05 15:42 ` Jonathan Morton
@ 2016-10-05 15:45   ` Kevin Darbyshire-Bryant
  2016-10-05 15:53     ` Jonathan Morton
  2016-10-05 15:53     ` Dave Taht
  0 siblings, 2 replies; 15+ messages in thread
From: Kevin Darbyshire-Bryant @ 2016-10-05 15:45 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: cake



On 05/10/16 16:42, Jonathan Morton wrote:
>
>> On 5 Oct, 2016, at 18:24, Kevin Darbyshire-Bryant
>> <kevin@darbyshire-bryant.me.uk> wrote:
>>
>> How amenable are you to changing all 4 BUG_ON instances in cake to
>> WARN_ON?
>>
>> Linus isn't a complete fan and I'm thinking that producing a stack
>> trace and trying to carry on is more helpful to a remote accessed,
>> no serial interface type device than just killing the kernel dead.
>>
>> Quite possibly other bad things(tm) will happen shortly after...or
>> maybe there will be enough time look at dmesg for that stack
>> trace.
>
> The two in cake_heap_swap() can probably go away completely - they
> were there to make sure the heap algorithms were working correctly.
> They never did trigger, as it happens.
>
> The other two are genuine serious bugs (array overflow) if they ever
> trigger.  It’s safer to leave them as-is.

Fair enough :-)  I wonder what it was that caused yesterday's issues?  I 
really must try again when I've more time to get proper access.

Kevin

>
> - Jonathan Morton
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Cake] BUG_ON vs WARN_ON
  2016-10-05 15:45   ` Kevin Darbyshire-Bryant
@ 2016-10-05 15:53     ` Jonathan Morton
  2016-10-05 18:53       ` Jonathan Morton
  2016-10-05 15:53     ` Dave Taht
  1 sibling, 1 reply; 15+ messages in thread
From: Jonathan Morton @ 2016-10-05 15:53 UTC (permalink / raw)
  To: Kevin Darbyshire-Bryant; +Cc: cake


> On 5 Oct, 2016, at 18:45, Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> wrote:
> 
> I wonder what it was that caused yesterday's issues?  I really must try again when I've more time to get proper access.

I’m having trouble reproducing it here.  I know one of my boxes froze the very first time I loaded it, but it’s been running fine ever since.  Another machine is currently refusing to insert the module, claiming a wrong exec format.  It’s all a bit bizarre.

I do have a few more avenues of enquiry to explore, though.

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Cake] BUG_ON vs WARN_ON
  2016-10-05 15:45   ` Kevin Darbyshire-Bryant
  2016-10-05 15:53     ` Jonathan Morton
@ 2016-10-05 15:53     ` Dave Taht
  2016-10-05 15:55       ` Kevin Darbyshire-Bryant
  1 sibling, 1 reply; 15+ messages in thread
From: Dave Taht @ 2016-10-05 15:53 UTC (permalink / raw)
  To: Kevin Darbyshire-Bryant; +Cc: Jonathan Morton, cake

I did test this version of cake yesterday, had no major problems, aside from:

1) it seeming not to register drops under some circumstances in the
statistics. (could be flent)

2) switching stuff like this

tc qdisc add dev eth0 root cake bandwidth 700mbit
tc qdisc replace dev eth0 root cake bandwidth 1100mbit # out of cpu,
basically, at 700 mbit
tc qdisc replace dev eth0 root cake unlimited # still stuck at 700mbit

kept the thing at 700mbits (out of cpu at that point).

tc qdisc del dev eth0 root cake
tc qdisc add dev eth0 root cake unlimited # native rate of 1gbit achieved


On Wed, Oct 5, 2016 at 8:45 AM, Kevin Darbyshire-Bryant
<kevin@darbyshire-bryant.me.uk> wrote:
>
>
> On 05/10/16 16:42, Jonathan Morton wrote:
>>
>>
>>> On 5 Oct, 2016, at 18:24, Kevin Darbyshire-Bryant
>>> <kevin@darbyshire-bryant.me.uk> wrote:
>>>
>>> How amenable are you to changing all 4 BUG_ON instances in cake to
>>> WARN_ON?
>>>
>>> Linus isn't a complete fan and I'm thinking that producing a stack
>>> trace and trying to carry on is more helpful to a remote accessed,
>>> no serial interface type device than just killing the kernel dead.
>>>
>>> Quite possibly other bad things(tm) will happen shortly after...or
>>> maybe there will be enough time look at dmesg for that stack
>>> trace.
>>
>>
>> The two in cake_heap_swap() can probably go away completely - they
>> were there to make sure the heap algorithms were working correctly.
>> They never did trigger, as it happens.
>>
>> The other two are genuine serious bugs (array overflow) if they ever
>> trigger.  It’s safer to leave them as-is.
>
>
> Fair enough :-)  I wonder what it was that caused yesterday's issues?  I
> really must try again when I've more time to get proper access.
>
> Kevin
>
>
>>
>> - Jonathan Morton
>>
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Cake] BUG_ON vs WARN_ON
  2016-10-05 15:53     ` Dave Taht
@ 2016-10-05 15:55       ` Kevin Darbyshire-Bryant
  2016-10-05 15:57         ` Jonathan Morton
  0 siblings, 1 reply; 15+ messages in thread
From: Kevin Darbyshire-Bryant @ 2016-10-05 15:55 UTC (permalink / raw)
  To: cake



On 05/10/16 16:53, Dave Taht wrote:
> I did test this version of cake yesterday, had no major problems, aside from:
>
> 1) it seeming not to register drops under some circumstances in the
> statistics. (could be flent)
>
> 2) switching stuff like this
>
> tc qdisc add dev eth0 root cake bandwidth 700mbit
> tc qdisc replace dev eth0 root cake bandwidth 1100mbit # out of cpu,
> basically, at 700 mbit
> tc qdisc replace dev eth0 root cake unlimited # still stuck at 700mbit
>
> kept the thing at 700mbits (out of cpu at that point).
>
> tc qdisc del dev eth0 root cake
> tc qdisc add dev eth0 root cake unlimited # native rate of 1gbit achieved
>

I don't trust tc replace and I'm pretty sure we've been here before.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Cake] BUG_ON vs WARN_ON
  2016-10-05 15:55       ` Kevin Darbyshire-Bryant
@ 2016-10-05 15:57         ` Jonathan Morton
  2016-10-05 16:38           ` Dave Taht
  0 siblings, 1 reply; 15+ messages in thread
From: Jonathan Morton @ 2016-10-05 15:57 UTC (permalink / raw)
  To: Kevin Darbyshire-Bryant; +Cc: cake, Dave Taht


> On 5 Oct, 2016, at 18:55, Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> wrote:
> 
> I don't trust tc replace and I'm pretty sure we've been here before.

Does tc change (with the same arguments otherwise) behave any differently?

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Cake] BUG_ON vs WARN_ON
  2016-10-05 15:57         ` Jonathan Morton
@ 2016-10-05 16:38           ` Dave Taht
  2016-10-06  3:59             ` Kevin Darbyshire-Bryant
  0 siblings, 1 reply; 15+ messages in thread
From: Dave Taht @ 2016-10-05 16:38 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Kevin Darbyshire-Bryant, cake

I cannot repeat that result this morning, with either replace or
change. I *was* running far more extensive tests between changing
things that way than I just did, but a string of quick tests, changing
the bandwidth, changing it to unlimited, etc got the correct behaviors
throughout for both replace and change. But, changing it to 50mbit
shows the capacity estimate here to still be 700Mbit.

d@apu2:~$ tc -s qdisc show dev enp2s0

qdisc cake 800f: root refcnt 9 bandwidth 50Mbit diffserv4 flows rtt 100.0ms raw

 Sent 7573781243 bytes 5002790 pkt (dropped 6, overlimits 1270728
requeues 19055)

 backlog 0b 0p requeues 19055

 memory used: 729504b of 4Mb

 capacity estimate: 700Mbit

                 Bulk   Best Effort      Video       Voice

  thresh      3125Kbit      50Mbit      25Mbit   12500Kbit

  target         5.8ms       5.0ms       5.0ms       5.0ms

  interval     100.8ms     100.0ms     100.0ms     100.0ms

  pk_delay         0us       4.6ms         1us        69us

  av_delay         0us       3.2ms         0us         4us

  sp_delay         0us         5us         0us         3us

  pkts               0     2991920           1          89

  bytes              0  4529561380          90       26451

  way_inds           0           0           0           0

  way_miss           0          38           1           2

  way_cols           0           0           0           0

  drops              0           6           0           0

  marks              0           0           0           0

  sp_flows           0           0           0           0

  bk_flows           0           0           0           1

  un_flows           0           0           0           0

  max_len            0       19682          90         927


I am still not seeing the number of drops I would expect in the cake
statistics. 4 with 4 flows going at 50mbit on a 30 second test doesn't
strike me as enough. I will do a capture later to see if they add up.

That said, I did have to patch out on this particular kernel, this:

Linux apu2 4.5.0-fqmac3.5-wt-ath # yes, I'm running 3 kernels behind mainline.


diff --git a/sch_cake.c b/sch_cake.c

index f613ab4..92c5078 100644

--- a/sch_cake.c

+++ b/sch_cake.c

@@ -64,7 +64,7 @@

 #endif



 #if (KERNEL_VERSION(4,4,11) > LINUX_VERSION_CODE) ||
((KERNEL_VERSION(4,5,0) <= LINUX_VERSION_CODE) &&
(KERNEL_VERSION(4,5,5) > LINUX_VERSION_CODE))

-#define qdisc_tree_reduce_backlog(_a,_b,_c) qdisc_tree_decrease_qlen(_a,_b)

+// #define qdisc_tree_reduce_backlog(_a,_b,_c) qdisc_tree_decrease_qlen(_a,_b)

 #endif

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Cake] BUG_ON vs WARN_ON
  2016-10-05 15:53     ` Jonathan Morton
@ 2016-10-05 18:53       ` Jonathan Morton
  2016-10-07 13:27         ` Kevin Darbyshire-Bryant
  0 siblings, 1 reply; 15+ messages in thread
From: Jonathan Morton @ 2016-10-05 18:53 UTC (permalink / raw)
  To: Kevin Darbyshire-Bryant; +Cc: cake

>> I wonder what it was that caused yesterday's issues?  I really must try again when I've more time to get proper access.
> 
> I’m having trouble reproducing it here.  I know one of my boxes froze the very first time I loaded it, but it’s been running fine ever since.  Another machine is currently refusing to insert the module, claiming a wrong exec format.  It’s all a bit bizarre.
> 
> I do have a few more avenues of enquiry to explore, though.

Aha - I managed to capture a kernel panic, which appears to trace to the lookup in the accelerator array.  It’s a read-only access, so it only panics if it hits unpaged memory, rather than corrupting anything.  Of course, if it reads outside the array, it’ll increment the deficit by a random value, but that usually won’t prevent traffic flowing.

The lookup is indexed on the host refcnt, which I’m using as the count of flows attached to that host.  It seems likely that it isn’t being maintained correctly in all cases, so it can wrap around past zero very soon after being attached, without needing much traffic.

I’ll try to fix that, and put a sanity check in as well to be certain.

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Cake] BUG_ON vs WARN_ON
  2016-10-05 16:38           ` Dave Taht
@ 2016-10-06  3:59             ` Kevin Darbyshire-Bryant
  2016-10-06  4:19               ` Dave Taht
  0 siblings, 1 reply; 15+ messages in thread
From: Kevin Darbyshire-Bryant @ 2016-10-06  3:59 UTC (permalink / raw)
  To: Dave Taht, Jonathan Morton; +Cc: cake



On 05/10/16 17:38, Dave Taht wrote:
> I cannot repeat that result this morning, with either replace or
> change.

Out of interest Dave,  which branch are you building/testing?  The 
'master' or 'cobalt'?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Cake] BUG_ON vs WARN_ON
  2016-10-06  3:59             ` Kevin Darbyshire-Bryant
@ 2016-10-06  4:19               ` Dave Taht
  2016-10-06  8:12                 ` Jonathan Morton
  0 siblings, 1 reply; 15+ messages in thread
From: Dave Taht @ 2016-10-06  4:19 UTC (permalink / raw)
  To: Kevin Darbyshire-Bryant; +Cc: Jonathan Morton, cake

master

On Wed, Oct 5, 2016 at 8:59 PM, Kevin Darbyshire-Bryant
<kevin@darbyshire-bryant.me.uk> wrote:
>
>
> On 05/10/16 17:38, Dave Taht wrote:
>>
>> I cannot repeat that result this morning, with either replace or
>> change.
>
>
> Out of interest Dave,  which branch are you building/testing?  The 'master'
> or 'cobalt'?



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Cake] BUG_ON vs WARN_ON
  2016-10-06  4:19               ` Dave Taht
@ 2016-10-06  8:12                 ` Jonathan Morton
  0 siblings, 0 replies; 15+ messages in thread
From: Jonathan Morton @ 2016-10-06  8:12 UTC (permalink / raw)
  To: Dave Taht; +Cc: Kevin Darbyshire-Bryant, cake


> On 6 Oct, 2016, at 07:19, Dave Taht <dave.taht@gmail.com> wrote:
> 
> master

Right - that’s stable code.  I’m doing experimental stuff in the cobalt branch.

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Cake] BUG_ON vs WARN_ON
  2016-10-05 18:53       ` Jonathan Morton
@ 2016-10-07 13:27         ` Kevin Darbyshire-Bryant
  2016-10-07 13:35           ` Jonathan Morton
  0 siblings, 1 reply; 15+ messages in thread
From: Kevin Darbyshire-Bryant @ 2016-10-07 13:27 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: cake



On 05/10/16 19:53, Jonathan Morton wrote:
>>> I wonder what it was that caused yesterday's issues?  I really
>>> must try again when I've more time to get proper access.
>>
>> I’m having trouble reproducing it here.  I know one of my boxes
>> froze the very first time I loaded it, but it’s been running fine
>> ever since.  Another machine is currently refusing to insert the
>> module, claiming a wrong exec format.  It’s all a bit bizarre.
>>
>> I do have a few more avenues of enquiry to explore, though.
>
> Aha - I managed to capture a kernel panic, which appears to trace to
> the lookup in the accelerator array.  It’s a read-only access, so it
> only panics if it hits unpaged memory, rather than corrupting
> anything.  Of course, if it reads outside the array, it’ll increment
> the deficit by a random value, but that usually won’t prevent traffic
> flowing.
>
> The lookup is indexed on the host refcnt, which I’m using as the
> count of flows attached to that host.  It seems likely that it isn’t
> being maintained correctly in all cases, so it can wrap around past
> zero very soon after being attached, without needing much traffic.
>
> I’ll try to fix that, and put a sanity check in as well to be
> certain.

It's now ok...so far :-)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Cake] BUG_ON vs WARN_ON
  2016-10-07 13:27         ` Kevin Darbyshire-Bryant
@ 2016-10-07 13:35           ` Jonathan Morton
  2016-10-07 13:43             ` Kevin Darbyshire-Bryant
  0 siblings, 1 reply; 15+ messages in thread
From: Jonathan Morton @ 2016-10-07 13:35 UTC (permalink / raw)
  To: Kevin Darbyshire-Bryant; +Cc: cake


> On 7 Oct, 2016, at 16:27, Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> wrote:
> 
> It's now ok...so far :-)

Okay.  I think I’ve found a couple of other things to improve, so stand by…

 - Jonathan Morton


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Cake] BUG_ON vs WARN_ON
  2016-10-07 13:35           ` Jonathan Morton
@ 2016-10-07 13:43             ` Kevin Darbyshire-Bryant
  0 siblings, 0 replies; 15+ messages in thread
From: Kevin Darbyshire-Bryant @ 2016-10-07 13:43 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: cake



On 07/10/16 14:35, Jonathan Morton wrote:
>
>> On 7 Oct, 2016, at 16:27, Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> wrote:
>>
>> It's now ok...so far :-)
>
> Okay.  I think I’ve found a couple of other things to improve, so stand by…

I'm not sure I can take all this excitement you know :)

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-10-07 13:43 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-05 15:24 [Cake] BUG_ON vs WARN_ON Kevin Darbyshire-Bryant
2016-10-05 15:42 ` Jonathan Morton
2016-10-05 15:45   ` Kevin Darbyshire-Bryant
2016-10-05 15:53     ` Jonathan Morton
2016-10-05 18:53       ` Jonathan Morton
2016-10-07 13:27         ` Kevin Darbyshire-Bryant
2016-10-07 13:35           ` Jonathan Morton
2016-10-07 13:43             ` Kevin Darbyshire-Bryant
2016-10-05 15:53     ` Dave Taht
2016-10-05 15:55       ` Kevin Darbyshire-Bryant
2016-10-05 15:57         ` Jonathan Morton
2016-10-05 16:38           ` Dave Taht
2016-10-06  3:59             ` Kevin Darbyshire-Bryant
2016-10-06  4:19               ` Dave Taht
2016-10-06  8:12                 ` Jonathan Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox