* [Cake] BUG_ON vs WARN_ON @ 2016-10-05 15:24 Kevin Darbyshire-Bryant 2016-10-05 15:42 ` Jonathan Morton 0 siblings, 1 reply; 15+ messages in thread From: Kevin Darbyshire-Bryant @ 2016-10-05 15:24 UTC (permalink / raw) To: cake Hi Jonathan, How amenable are you to changing all 4 BUG_ON instances in cake to WARN_ON? Linus isn't a complete fan and I'm thinking that producing a stack trace and trying to carry on is more helpful to a remote accessed, no serial interface type device than just killing the kernel dead. Quite possibly other bad things(tm) will happen shortly after...or maybe there will be enough time look at dmesg for that stack trace. Thoughts? Kevin ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Cake] BUG_ON vs WARN_ON 2016-10-05 15:24 [Cake] BUG_ON vs WARN_ON Kevin Darbyshire-Bryant @ 2016-10-05 15:42 ` Jonathan Morton 2016-10-05 15:45 ` Kevin Darbyshire-Bryant 0 siblings, 1 reply; 15+ messages in thread From: Jonathan Morton @ 2016-10-05 15:42 UTC (permalink / raw) To: Kevin Darbyshire-Bryant; +Cc: cake > On 5 Oct, 2016, at 18:24, Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> wrote: > > How amenable are you to changing all 4 BUG_ON instances in cake to WARN_ON? > > Linus isn't a complete fan and I'm thinking that producing a stack trace and trying to carry on is more helpful to a remote accessed, no serial interface type device than just killing the kernel dead. > > Quite possibly other bad things(tm) will happen shortly after...or maybe there will be enough time look at dmesg for that stack trace. The two in cake_heap_swap() can probably go away completely - they were there to make sure the heap algorithms were working correctly. They never did trigger, as it happens. The other two are genuine serious bugs (array overflow) if they ever trigger. It’s safer to leave them as-is. - Jonathan Morton ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Cake] BUG_ON vs WARN_ON 2016-10-05 15:42 ` Jonathan Morton @ 2016-10-05 15:45 ` Kevin Darbyshire-Bryant 2016-10-05 15:53 ` Jonathan Morton 2016-10-05 15:53 ` Dave Taht 0 siblings, 2 replies; 15+ messages in thread From: Kevin Darbyshire-Bryant @ 2016-10-05 15:45 UTC (permalink / raw) To: Jonathan Morton; +Cc: cake On 05/10/16 16:42, Jonathan Morton wrote: > >> On 5 Oct, 2016, at 18:24, Kevin Darbyshire-Bryant >> <kevin@darbyshire-bryant.me.uk> wrote: >> >> How amenable are you to changing all 4 BUG_ON instances in cake to >> WARN_ON? >> >> Linus isn't a complete fan and I'm thinking that producing a stack >> trace and trying to carry on is more helpful to a remote accessed, >> no serial interface type device than just killing the kernel dead. >> >> Quite possibly other bad things(tm) will happen shortly after...or >> maybe there will be enough time look at dmesg for that stack >> trace. > > The two in cake_heap_swap() can probably go away completely - they > were there to make sure the heap algorithms were working correctly. > They never did trigger, as it happens. > > The other two are genuine serious bugs (array overflow) if they ever > trigger. It’s safer to leave them as-is. Fair enough :-) I wonder what it was that caused yesterday's issues? I really must try again when I've more time to get proper access. Kevin > > - Jonathan Morton > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Cake] BUG_ON vs WARN_ON 2016-10-05 15:45 ` Kevin Darbyshire-Bryant @ 2016-10-05 15:53 ` Jonathan Morton 2016-10-05 18:53 ` Jonathan Morton 2016-10-05 15:53 ` Dave Taht 1 sibling, 1 reply; 15+ messages in thread From: Jonathan Morton @ 2016-10-05 15:53 UTC (permalink / raw) To: Kevin Darbyshire-Bryant; +Cc: cake > On 5 Oct, 2016, at 18:45, Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> wrote: > > I wonder what it was that caused yesterday's issues? I really must try again when I've more time to get proper access. I’m having trouble reproducing it here. I know one of my boxes froze the very first time I loaded it, but it’s been running fine ever since. Another machine is currently refusing to insert the module, claiming a wrong exec format. It’s all a bit bizarre. I do have a few more avenues of enquiry to explore, though. - Jonathan Morton ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Cake] BUG_ON vs WARN_ON 2016-10-05 15:53 ` Jonathan Morton @ 2016-10-05 18:53 ` Jonathan Morton 2016-10-07 13:27 ` Kevin Darbyshire-Bryant 0 siblings, 1 reply; 15+ messages in thread From: Jonathan Morton @ 2016-10-05 18:53 UTC (permalink / raw) To: Kevin Darbyshire-Bryant; +Cc: cake >> I wonder what it was that caused yesterday's issues? I really must try again when I've more time to get proper access. > > I’m having trouble reproducing it here. I know one of my boxes froze the very first time I loaded it, but it’s been running fine ever since. Another machine is currently refusing to insert the module, claiming a wrong exec format. It’s all a bit bizarre. > > I do have a few more avenues of enquiry to explore, though. Aha - I managed to capture a kernel panic, which appears to trace to the lookup in the accelerator array. It’s a read-only access, so it only panics if it hits unpaged memory, rather than corrupting anything. Of course, if it reads outside the array, it’ll increment the deficit by a random value, but that usually won’t prevent traffic flowing. The lookup is indexed on the host refcnt, which I’m using as the count of flows attached to that host. It seems likely that it isn’t being maintained correctly in all cases, so it can wrap around past zero very soon after being attached, without needing much traffic. I’ll try to fix that, and put a sanity check in as well to be certain. - Jonathan Morton ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Cake] BUG_ON vs WARN_ON 2016-10-05 18:53 ` Jonathan Morton @ 2016-10-07 13:27 ` Kevin Darbyshire-Bryant 2016-10-07 13:35 ` Jonathan Morton 0 siblings, 1 reply; 15+ messages in thread From: Kevin Darbyshire-Bryant @ 2016-10-07 13:27 UTC (permalink / raw) To: Jonathan Morton; +Cc: cake On 05/10/16 19:53, Jonathan Morton wrote: >>> I wonder what it was that caused yesterday's issues? I really >>> must try again when I've more time to get proper access. >> >> I’m having trouble reproducing it here. I know one of my boxes >> froze the very first time I loaded it, but it’s been running fine >> ever since. Another machine is currently refusing to insert the >> module, claiming a wrong exec format. It’s all a bit bizarre. >> >> I do have a few more avenues of enquiry to explore, though. > > Aha - I managed to capture a kernel panic, which appears to trace to > the lookup in the accelerator array. It’s a read-only access, so it > only panics if it hits unpaged memory, rather than corrupting > anything. Of course, if it reads outside the array, it’ll increment > the deficit by a random value, but that usually won’t prevent traffic > flowing. > > The lookup is indexed on the host refcnt, which I’m using as the > count of flows attached to that host. It seems likely that it isn’t > being maintained correctly in all cases, so it can wrap around past > zero very soon after being attached, without needing much traffic. > > I’ll try to fix that, and put a sanity check in as well to be > certain. It's now ok...so far :-) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Cake] BUG_ON vs WARN_ON 2016-10-07 13:27 ` Kevin Darbyshire-Bryant @ 2016-10-07 13:35 ` Jonathan Morton 2016-10-07 13:43 ` Kevin Darbyshire-Bryant 0 siblings, 1 reply; 15+ messages in thread From: Jonathan Morton @ 2016-10-07 13:35 UTC (permalink / raw) To: Kevin Darbyshire-Bryant; +Cc: cake > On 7 Oct, 2016, at 16:27, Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> wrote: > > It's now ok...so far :-) Okay. I think I’ve found a couple of other things to improve, so stand by… - Jonathan Morton ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Cake] BUG_ON vs WARN_ON 2016-10-07 13:35 ` Jonathan Morton @ 2016-10-07 13:43 ` Kevin Darbyshire-Bryant 0 siblings, 0 replies; 15+ messages in thread From: Kevin Darbyshire-Bryant @ 2016-10-07 13:43 UTC (permalink / raw) To: Jonathan Morton; +Cc: cake On 07/10/16 14:35, Jonathan Morton wrote: > >> On 7 Oct, 2016, at 16:27, Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> wrote: >> >> It's now ok...so far :-) > > Okay. I think I’ve found a couple of other things to improve, so stand by… I'm not sure I can take all this excitement you know :) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Cake] BUG_ON vs WARN_ON 2016-10-05 15:45 ` Kevin Darbyshire-Bryant 2016-10-05 15:53 ` Jonathan Morton @ 2016-10-05 15:53 ` Dave Taht 2016-10-05 15:55 ` Kevin Darbyshire-Bryant 1 sibling, 1 reply; 15+ messages in thread From: Dave Taht @ 2016-10-05 15:53 UTC (permalink / raw) To: Kevin Darbyshire-Bryant; +Cc: Jonathan Morton, cake I did test this version of cake yesterday, had no major problems, aside from: 1) it seeming not to register drops under some circumstances in the statistics. (could be flent) 2) switching stuff like this tc qdisc add dev eth0 root cake bandwidth 700mbit tc qdisc replace dev eth0 root cake bandwidth 1100mbit # out of cpu, basically, at 700 mbit tc qdisc replace dev eth0 root cake unlimited # still stuck at 700mbit kept the thing at 700mbits (out of cpu at that point). tc qdisc del dev eth0 root cake tc qdisc add dev eth0 root cake unlimited # native rate of 1gbit achieved On Wed, Oct 5, 2016 at 8:45 AM, Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> wrote: > > > On 05/10/16 16:42, Jonathan Morton wrote: >> >> >>> On 5 Oct, 2016, at 18:24, Kevin Darbyshire-Bryant >>> <kevin@darbyshire-bryant.me.uk> wrote: >>> >>> How amenable are you to changing all 4 BUG_ON instances in cake to >>> WARN_ON? >>> >>> Linus isn't a complete fan and I'm thinking that producing a stack >>> trace and trying to carry on is more helpful to a remote accessed, >>> no serial interface type device than just killing the kernel dead. >>> >>> Quite possibly other bad things(tm) will happen shortly after...or >>> maybe there will be enough time look at dmesg for that stack >>> trace. >> >> >> The two in cake_heap_swap() can probably go away completely - they >> were there to make sure the heap algorithms were working correctly. >> They never did trigger, as it happens. >> >> The other two are genuine serious bugs (array overflow) if they ever >> trigger. It’s safer to leave them as-is. > > > Fair enough :-) I wonder what it was that caused yesterday's issues? I > really must try again when I've more time to get proper access. > > Kevin > > >> >> - Jonathan Morton >> > _______________________________________________ > Cake mailing list > Cake@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/cake -- Dave Täht Let's go make home routers and wifi faster! With better software! http://blog.cerowrt.org ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Cake] BUG_ON vs WARN_ON 2016-10-05 15:53 ` Dave Taht @ 2016-10-05 15:55 ` Kevin Darbyshire-Bryant 2016-10-05 15:57 ` Jonathan Morton 0 siblings, 1 reply; 15+ messages in thread From: Kevin Darbyshire-Bryant @ 2016-10-05 15:55 UTC (permalink / raw) To: cake On 05/10/16 16:53, Dave Taht wrote: > I did test this version of cake yesterday, had no major problems, aside from: > > 1) it seeming not to register drops under some circumstances in the > statistics. (could be flent) > > 2) switching stuff like this > > tc qdisc add dev eth0 root cake bandwidth 700mbit > tc qdisc replace dev eth0 root cake bandwidth 1100mbit # out of cpu, > basically, at 700 mbit > tc qdisc replace dev eth0 root cake unlimited # still stuck at 700mbit > > kept the thing at 700mbits (out of cpu at that point). > > tc qdisc del dev eth0 root cake > tc qdisc add dev eth0 root cake unlimited # native rate of 1gbit achieved > I don't trust tc replace and I'm pretty sure we've been here before. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Cake] BUG_ON vs WARN_ON 2016-10-05 15:55 ` Kevin Darbyshire-Bryant @ 2016-10-05 15:57 ` Jonathan Morton 2016-10-05 16:38 ` Dave Taht 0 siblings, 1 reply; 15+ messages in thread From: Jonathan Morton @ 2016-10-05 15:57 UTC (permalink / raw) To: Kevin Darbyshire-Bryant; +Cc: cake, Dave Taht > On 5 Oct, 2016, at 18:55, Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> wrote: > > I don't trust tc replace and I'm pretty sure we've been here before. Does tc change (with the same arguments otherwise) behave any differently? - Jonathan Morton ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Cake] BUG_ON vs WARN_ON 2016-10-05 15:57 ` Jonathan Morton @ 2016-10-05 16:38 ` Dave Taht 2016-10-06 3:59 ` Kevin Darbyshire-Bryant 0 siblings, 1 reply; 15+ messages in thread From: Dave Taht @ 2016-10-05 16:38 UTC (permalink / raw) To: Jonathan Morton; +Cc: Kevin Darbyshire-Bryant, cake I cannot repeat that result this morning, with either replace or change. I *was* running far more extensive tests between changing things that way than I just did, but a string of quick tests, changing the bandwidth, changing it to unlimited, etc got the correct behaviors throughout for both replace and change. But, changing it to 50mbit shows the capacity estimate here to still be 700Mbit. d@apu2:~$ tc -s qdisc show dev enp2s0 qdisc cake 800f: root refcnt 9 bandwidth 50Mbit diffserv4 flows rtt 100.0ms raw Sent 7573781243 bytes 5002790 pkt (dropped 6, overlimits 1270728 requeues 19055) backlog 0b 0p requeues 19055 memory used: 729504b of 4Mb capacity estimate: 700Mbit Bulk Best Effort Video Voice thresh 3125Kbit 50Mbit 25Mbit 12500Kbit target 5.8ms 5.0ms 5.0ms 5.0ms interval 100.8ms 100.0ms 100.0ms 100.0ms pk_delay 0us 4.6ms 1us 69us av_delay 0us 3.2ms 0us 4us sp_delay 0us 5us 0us 3us pkts 0 2991920 1 89 bytes 0 4529561380 90 26451 way_inds 0 0 0 0 way_miss 0 38 1 2 way_cols 0 0 0 0 drops 0 6 0 0 marks 0 0 0 0 sp_flows 0 0 0 0 bk_flows 0 0 0 1 un_flows 0 0 0 0 max_len 0 19682 90 927 I am still not seeing the number of drops I would expect in the cake statistics. 4 with 4 flows going at 50mbit on a 30 second test doesn't strike me as enough. I will do a capture later to see if they add up. That said, I did have to patch out on this particular kernel, this: Linux apu2 4.5.0-fqmac3.5-wt-ath # yes, I'm running 3 kernels behind mainline. diff --git a/sch_cake.c b/sch_cake.c index f613ab4..92c5078 100644 --- a/sch_cake.c +++ b/sch_cake.c @@ -64,7 +64,7 @@ #endif #if (KERNEL_VERSION(4,4,11) > LINUX_VERSION_CODE) || ((KERNEL_VERSION(4,5,0) <= LINUX_VERSION_CODE) && (KERNEL_VERSION(4,5,5) > LINUX_VERSION_CODE)) -#define qdisc_tree_reduce_backlog(_a,_b,_c) qdisc_tree_decrease_qlen(_a,_b) +// #define qdisc_tree_reduce_backlog(_a,_b,_c) qdisc_tree_decrease_qlen(_a,_b) #endif ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Cake] BUG_ON vs WARN_ON 2016-10-05 16:38 ` Dave Taht @ 2016-10-06 3:59 ` Kevin Darbyshire-Bryant 2016-10-06 4:19 ` Dave Taht 0 siblings, 1 reply; 15+ messages in thread From: Kevin Darbyshire-Bryant @ 2016-10-06 3:59 UTC (permalink / raw) To: Dave Taht, Jonathan Morton; +Cc: cake On 05/10/16 17:38, Dave Taht wrote: > I cannot repeat that result this morning, with either replace or > change. Out of interest Dave, which branch are you building/testing? The 'master' or 'cobalt'? ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Cake] BUG_ON vs WARN_ON 2016-10-06 3:59 ` Kevin Darbyshire-Bryant @ 2016-10-06 4:19 ` Dave Taht 2016-10-06 8:12 ` Jonathan Morton 0 siblings, 1 reply; 15+ messages in thread From: Dave Taht @ 2016-10-06 4:19 UTC (permalink / raw) To: Kevin Darbyshire-Bryant; +Cc: Jonathan Morton, cake master On Wed, Oct 5, 2016 at 8:59 PM, Kevin Darbyshire-Bryant <kevin@darbyshire-bryant.me.uk> wrote: > > > On 05/10/16 17:38, Dave Taht wrote: >> >> I cannot repeat that result this morning, with either replace or >> change. > > > Out of interest Dave, which branch are you building/testing? The 'master' > or 'cobalt'? -- Dave Täht Let's go make home routers and wifi faster! With better software! http://blog.cerowrt.org ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Cake] BUG_ON vs WARN_ON 2016-10-06 4:19 ` Dave Taht @ 2016-10-06 8:12 ` Jonathan Morton 0 siblings, 0 replies; 15+ messages in thread From: Jonathan Morton @ 2016-10-06 8:12 UTC (permalink / raw) To: Dave Taht; +Cc: Kevin Darbyshire-Bryant, cake > On 6 Oct, 2016, at 07:19, Dave Taht <dave.taht@gmail.com> wrote: > > master Right - that’s stable code. I’m doing experimental stuff in the cobalt branch. - Jonathan Morton ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2016-10-07 13:43 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-10-05 15:24 [Cake] BUG_ON vs WARN_ON Kevin Darbyshire-Bryant 2016-10-05 15:42 ` Jonathan Morton 2016-10-05 15:45 ` Kevin Darbyshire-Bryant 2016-10-05 15:53 ` Jonathan Morton 2016-10-05 18:53 ` Jonathan Morton 2016-10-07 13:27 ` Kevin Darbyshire-Bryant 2016-10-07 13:35 ` Jonathan Morton 2016-10-07 13:43 ` Kevin Darbyshire-Bryant 2016-10-05 15:53 ` Dave Taht 2016-10-05 15:55 ` Kevin Darbyshire-Bryant 2016-10-05 15:57 ` Jonathan Morton 2016-10-05 16:38 ` Dave Taht 2016-10-06 3:59 ` Kevin Darbyshire-Bryant 2016-10-06 4:19 ` Dave Taht 2016-10-06 8:12 ` Jonathan Morton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox