[Cake] Fighting bloat in the face of uncertinty

Cake - FQ_codel the next generation
 help / color / mirror / Atom feed

* [Cake] Fighting bloat in the face of uncertinty
@ 2019-09-07 22:42 Justin Kilpatrick
  2019-09-07 23:09 ` Jonathan Morton
  0 siblings, 1 reply; 36+ messages in thread
From: Justin Kilpatrick @ 2019-09-07 22:42 UTC (permalink / raw)
  To: cake

I'm using Cake on embedded OpenWRT devices. You probably saw this video on the list a month or two ago. 

https://www.youtube.com/watch?v=G4EKbgShyLw

Anyways up until now I've left cake totally untuned and had pretty great results. But we've finally encountered a scenario where untuned Cake allowed for unacceptable bufferbloat on a link.

Hand configuration in accordance with the best practices provided in the RFC works out perfectly, but I need a set of settings I can ship with any device with the expectation that it will be used and abused in many non-standard situations. Producing non-optimal outcomes is fine, producing dramatically degraded outcomes is unacceptable. 

Which leads to a few questions

1) What happens if the target is dramatically too low? 

Most of our links can expect latency between 1-10ms, but they may occasionally go much longer than that. What are the consequences of having a 100ms link configured with a target of 10ms?

2) If interval is dramatically unpredictable is it best to err on the side of under or over estimating?

 The user may select an VPN/exit server of their own choosing, the path to it over the network may change or the exit may be much further away. Both 10ms and 80ms would be sane choices of target depending on factors that may change on the fly. 

Thanks for the feedback! 

-- 
  Justin Kilpatrick
  justin@althea.net

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Cake] Fighting bloat in the face of uncertinty
  2019-09-07 22:42 [Cake] Fighting bloat in the face of uncertinty Justin Kilpatrick
@ 2019-09-07 23:09 ` Jonathan Morton
  2019-09-07 23:31   ` Justin Kilpatrick
  0 siblings, 1 reply; 36+ messages in thread
From: Jonathan Morton @ 2019-09-07 23:09 UTC (permalink / raw)
  To: Justin Kilpatrick; +Cc: cake

> On 8 Sep, 2019, at 1:42 am, Justin Kilpatrick <justin@althea.net> wrote:
> 
> I'm using Cake on embedded OpenWRT devices. You probably saw this video on the list a month or two ago. 
> 
> https://www.youtube.com/watch?v=G4EKbgShyLw

I haven't actually watched that one yet...

> Anyways up until now I've left cake totally untuned and had pretty great results. But we've finally encountered a scenario where untuned Cake allowed for unacceptable bufferbloat on a link.
> 
> Hand configuration in accordance with the best practices provided in the RFC works out perfectly, but I need a set of settings I can ship with any device with the expectation that it will be used and abused in many non-standard situations. Producing non-optimal outcomes is fine, producing dramatically degraded outcomes is unacceptable. 

What was the scenario that gave you trouble?

I note that Cake is not defined in an RFC.  Were you referring to a Codel RFC?  Cake is a bit more sophisticated, with the aim of making it easier to configure.

> Which leads to a few questions
> 
> 1) What happens if the target is dramatically too low? 
> 
> Most of our links can expect latency between 1-10ms, but they may occasionally go much longer than that. What are the consequences of having a 100ms link configured with a target of 10ms?

The default 'target' parameter is normally 5ms, which goes with a default 'rtt' and 'interval' parameter of 100ms.

You shouldn't normally need to set 'target' and 'interval' manually, only 'rtt', and there are various keywords to assist with choosing an appropriate 'rtt'.  The default of 100ms is provided by the 'internet' keyword, and this should be able to cope reasonably well with paths down to 10ms.  You could also try "regional" which gives you tuning for 30ms, or "metro" which gives you 10ms, with good behaviour on paths within about an order of magnitude of that.

Remember, it's the path RTT that matters for this, not the link itself.

Should the bandwidth setting correspond to a serialisation delay per packet that approaches the 'target' implied by the above, 'target' will automatically be tuned to avoid the nasty effects that might cause - *unless* you manually override it.  So don't do that.

ECN enabled flows should not easily notice an 'rtt' setting that's too small.  RFC-3168 compliant transports only care about how many RTTs contain at least one CE mark.  Non-ECN flows may see elevated packet loss, however, and thus more retransmissions, but the same congestion control behaviour.  Cake protects these flows from experiencing "tail loss" which could lead to an RTO that the end-user would notice.

> 2) If interval is dramatically unpredictable is it best to err on the side of under or over estimating?
> 
> The user may select an VPN/exit server of their own choosing, the path to it over the network may change or the exit may be much further away. Both 10ms and 80ms would be sane choices of target depending on factors that may change on the fly.

Generally the default 'rtt' of 100ms is suitable for generic Internet paths, including nearby 10ms hops and 500ms satellite-linked islands.  The default 5ms target actually puts a floor on the minimum effective RTT that the marking schedule has to cope with.  There's also a good chance that the "hystart" algorithm in CUBIC will drop it out of slow-start on very short-RTT paths before the AQM triggers.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Cake] Fighting bloat in the face of uncertinty
  2019-09-07 23:09 ` Jonathan Morton
@ 2019-09-07 23:31   ` Justin Kilpatrick
  2019-09-07 23:42     ` Jonathan Morton
  0 siblings, 1 reply; 36+ messages in thread
From: Justin Kilpatrick @ 2019-09-07 23:31 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: cake

> What was the scenario that gave you trouble?

We had a 1ms link bloating on a ~6ms path using the default 'internet' profile. The link got very bloaty when loaded (100ms) but only in the download direction.

All traffic in this system has Cake applied on egress, so the sending node was using Cake to place packets onto the 100mbps line, and the receiving node (which was not applying Cake on input) got them up to 300ms later. 

On the other hand during an upload test (which seems like a trivial reversal of the same situation) things where flawless even with the 'internet' rtt value of 100ms. 

Setting the throughput to the link capacity or using the 'metro' /'lan' profiles worked, but was only required on the upstream node to resolve download bloat. 

I think this is an artifact of the traffic in question? Bloat only occurred in the download direction where there where many user streams all happy to munch away at more than the link capacity if you let them. Upload was much less contentious. 

> I note that Cake is not defined in an RFC.  Were you referring to a 
> Codel RFC?  Cake is a bit more sophisticated, with the aim of making it 
> easier to configure.

Yes, at least for definitions of target/interval etc and some tuning guidelines. Although apparently I still get them confused. 

> Should the bandwidth setting correspond to a serialisation delay per 
> packet that approaches the 'target' implied by the above, 'target' will 
> automatically be tuned to avoid the nasty effects that might cause - 
> *unless* you manually override it.  So don't do that.

I can estimate link throughput with more reliability than RTT. But not exactly much, it's easy to be off by 50%. Since setting throughput reduces capacity if I'm wrong I've tried to stay way from it. 

If I set a throughput that's 50% too high should it still help? In my testing it didn't seem to. But I was still using the 'internet' key word otherwise so maybe I was just shooting myself in the foot some other way at the same time. 



-- 
  Justin Kilpatrick
  justin@althea.net

On Sat, Sep 7, 2019, at 7:09 PM, Jonathan Morton wrote:
> > On 8 Sep, 2019, at 1:42 am, Justin Kilpatrick <justin@althea.net> wrote:
> > 
> > I'm using Cake on embedded OpenWRT devices. You probably saw this video on the list a month or two ago. 
> > 
> > https://www.youtube.com/watch?v=G4EKbgShyLw
> 
> I haven't actually watched that one yet...
> 
> > Anyways up until now I've left cake totally untuned and had pretty great results. But we've finally encountered a scenario where untuned Cake allowed for unacceptable bufferbloat on a link.
> > 
> > Hand configuration in accordance with the best practices provided in the RFC works out perfectly, but I need a set of settings I can ship with any device with the expectation that it will be used and abused in many non-standard situations. Producing non-optimal outcomes is fine, producing dramatically degraded outcomes is unacceptable. 
> 
> What was the scenario that gave you trouble?
> 
> I note that Cake is not defined in an RFC.  Were you referring to a 
> Codel RFC?  Cake is a bit more sophisticated, with the aim of making it 
> easier to configure.
> 
> > Which leads to a few questions
> > 
> > 1) What happens if the target is dramatically too low? 
> > 
> > Most of our links can expect latency between 1-10ms, but they may occasionally go much longer than that. What are the consequences of having a 100ms link configured with a target of 10ms?
> 
> The default 'target' parameter is normally 5ms, which goes with a 
> default 'rtt' and 'interval' parameter of 100ms.
> 
> You shouldn't normally need to set 'target' and 'interval' manually, 
> only 'rtt', and there are various keywords to assist with choosing an 
> appropriate 'rtt'.  The default of 100ms is provided by the 'internet' 
> keyword, and this should be able to cope reasonably well with paths 
> down to 10ms.  You could also try "regional" which gives you tuning for 
> 30ms, or "metro" which gives you 10ms, with good behaviour on paths 
> within about an order of magnitude of that.
> 
> Remember, it's the path RTT that matters for this, not the link itself.
> 
> Should the bandwidth setting correspond to a serialisation delay per 
> packet that approaches the 'target' implied by the above, 'target' will 
> automatically be tuned to avoid the nasty effects that might cause - 
> *unless* you manually override it.  So don't do that.
> 
> ECN enabled flows should not easily notice an 'rtt' setting that's too 
> small.  RFC-3168 compliant transports only care about how many RTTs 
> contain at least one CE mark.  Non-ECN flows may see elevated packet 
> loss, however, and thus more retransmissions, but the same congestion 
> control behaviour.  Cake protects these flows from experiencing "tail 
> loss" which could lead to an RTO that the end-user would notice.
> 
> > 2) If interval is dramatically unpredictable is it best to err on the side of under or over estimating?
> > 
> > The user may select an VPN/exit server of their own choosing, the path to it over the network may change or the exit may be much further away. Both 10ms and 80ms would be sane choices of target depending on factors that may change on the fly.
> 
> Generally the default 'rtt' of 100ms is suitable for generic Internet 
> paths, including nearby 10ms hops and 500ms satellite-linked islands.  
> The default 5ms target actually puts a floor on the minimum effective 
> RTT that the marking schedule has to cope with.  There's also a good 
> chance that the "hystart" algorithm in CUBIC will drop it out of 
> slow-start on very short-RTT paths before the AQM triggers.
> 
>  - Jonathan Morton
> 
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Cake] Fighting bloat in the face of uncertinty
  2019-09-07 23:31   ` Justin Kilpatrick
@ 2019-09-07 23:42     ` Jonathan Morton
  2019-09-08  0:03       ` Justin Kilpatrick
  0 siblings, 1 reply; 36+ messages in thread
From: Jonathan Morton @ 2019-09-07 23:42 UTC (permalink / raw)
  To: Justin Kilpatrick; +Cc: cake

> On 8 Sep, 2019, at 2:31 am, Justin Kilpatrick <justin@althea.net> wrote:
> 
> If I set a throughput that's 50% too high should it still help? In my testing it didn't seem to.

In that case you would be relying on backpressure from the network interface to cause queuing to actually occur in Cake rather than in the driver or hardware (which would almost certainly be a dumb FIFO).  If the driver doesn't implement BQL, that would easily explain 300ms of bloat.

In fact I'm unsure as to why changing the AQM parameters would cure it.  You may have benefited from an unintentional second-order effect which we normally try to eliminate, when the 'target' parameter gets too close to the CPU scheduling latency of the kernel.

I generally find it's better to *underestimate* the bandwidth parameter by 50% than the reverse, simply to keep the queue out of the dumb hardware.  But if you want to try implementing BQL in the relevant drivers, go ahead.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Cake] Fighting bloat in the face of uncertinty
  2019-09-07 23:42     ` Jonathan Morton
@ 2019-09-08  0:03       ` Justin Kilpatrick
  2019-09-08  0:59         ` Jonathan Morton
  0 siblings, 1 reply; 36+ messages in thread
From: Justin Kilpatrick @ 2019-09-08  0:03 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: cake

Sadly this isn't a driver, its a point to point wireless device that often seems designed to introduce bloat. I could probably ssh in and configure it to behave properly but that's not very scalable. 

Underestimating link capacity dramatically isn't an option as no matter how buttery smooth the experience people still crave those high speedtest numbers. 

> In fact I'm unsure as to why changing the AQM parameters would cure it. 
>  You may have benefited from an unintentional second-order effect which 
> we normally try to eliminate, when the 'target' parameter gets too 
> close to the CPU scheduling latency of the kernel.

So you believe that setting the target RTT closer to the path latency was not the main contributor to reducing bloat? Is there a configuration I could use to demonstrate that one way or the other? 


-- 
  Justin Kilpatrick
  justin@althea.net

On Sat, Sep 7, 2019, at 7:42 PM, Jonathan Morton wrote:
> > On 8 Sep, 2019, at 2:31 am, Justin Kilpatrick <justin@althea.net> wrote:
> > 
> > If I set a throughput that's 50% too high should it still help? In my testing it didn't seem to.
> 
> In that case you would be relying on backpressure from the network 
> interface to cause queuing to actually occur in Cake rather than in the 
> driver or hardware (which would almost certainly be a dumb FIFO).  If 
> the driver doesn't implement BQL, that would easily explain 300ms of 
> bloat.
> 
> In fact I'm unsure as to why changing the AQM parameters would cure it. 
>  You may have benefited from an unintentional second-order effect which 
> we normally try to eliminate, when the 'target' parameter gets too 
> close to the CPU scheduling latency of the kernel.
> 
> I generally find it's better to *underestimate* the bandwidth parameter 
> by 50% than the reverse, simply to keep the queue out of the dumb 
> hardware.  But if you want to try implementing BQL in the relevant 
> drivers, go ahead.
> 
>  - Jonathan Morton

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Cake] Fighting bloat in the face of uncertinty
  2019-09-08  0:03       ` Justin Kilpatrick
@ 2019-09-08  0:59         ` Jonathan Morton
  2019-09-08 14:29           ` Justin Kilpatrick
  0 siblings, 1 reply; 36+ messages in thread
From: Jonathan Morton @ 2019-09-08  0:59 UTC (permalink / raw)
  To: Justin Kilpatrick; +Cc: cake

> On 8 Sep, 2019, at 3:03 am, Justin Kilpatrick <justin@althea.net> wrote:
> 
> So you believe that setting the target RTT closer to the path latency was not the main contributor to reducing bloat? Is there a configuration I could use to demonstrate that one way or the other? 

The second-order effect I mentioned is related to the 'target' parameter.  Checking the code, I am reminded that while Cake itself can have 'target' set from userspace, there actually isn't a parameter to the tc module which allows setting it independently of 'rtt'.  But there *is* a table in q_cake.c (in tc) which you can temporarily extend with the following entries for experimentation:

static struct cake_preset presets[] = {
	{"datacentre",		5,		100},
	{"lan",			50,		1000},
	{"metro",		500,		10000},
	{"regional",		1500,		30000},
	{"internet",		5000,		100000},
	{"oceanic",		15000,		300000},
	{"satellite",		50000,		1000000},
	{"interplanetary",	50000000,	1000000000},
+
+	{"metro-loose",		5000,		10000},
+	{"internet-tight",	500,		100000},
};

If the effect is genuinely due to marking rate, then 'metro-loose' should behave like 'metro' and 'internet-tight' should behave like 'internet', to a first-order approximation.  If, on the other hand, it's due to the second-order interaction with CPU scheduling latency, the reverse may be true.  The latter is not something you should be counting on, as it will insert random AQM marking even when the link is not actually saturated.

You could also set it back to 'internet' and progressively reduce the bandwidth parameter, making the Cake shaper into the actual bottleneck.  This is the correct fix for the problem, and you should notice an instant improvement as soon as the bandwidth parameter is correct.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Cake] Fighting bloat in the face of uncertinty
  2019-09-08  0:59         ` Jonathan Morton
@ 2019-09-08 14:29           ` Justin Kilpatrick
  2019-09-08 17:27             ` Jonathan Morton
  0 siblings, 1 reply; 36+ messages in thread
From: Justin Kilpatrick @ 2019-09-08 14:29 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: cake

> The second-order effect I mentioned is related to the 'target' 
> parameter.  Checking the code, I am reminded that while Cake itself can 
> have 'target' set from userspace, there actually isn't a parameter to 
> the tc module which allows setting it independently of 'rtt'.  But 
> there *is* a table in q_cake.c (in tc) which you can temporarily extend 
> with the following entries for experimentation:

You are correct. My sampling was flawed and the 'metro' profile is not actually making any difference. 

the main contributor to bloat reduction was a  bandwidth parameter left over from too much use of `tc qdisc change` rather than add/del. 

> You could also set it back to 'internet' and progressively reduce the 
> bandwidth parameter, making the Cake shaper into the actual bottleneck. 
>  This is the correct fix for the problem, and you should notice an 
> instant improvement as soon as the bandwidth parameter is correct.

Hand tuning this one link is not a problem. I'm searching for a set of settings that will provide generally good performance across a wide range of devices, links, and situations. 

From what you've indicated so far there's nothing as effective as a correct bandwidth estimation if we consider the antenna (link) a black box. Expecting the user to input expected throughput for every link and then managing that information is essentially a non-starter. 

Radio tuning provides some improvement, but until ubiquiti starts shipping with Codel on non-router devices I don't think there's a good solution here. 

Any way to have the receiving device detect bloat and insert an ECN? I don't think the time spent in the intermediate device is detectable at the kernel level but we keep track of latency for routing decisions and could detect bloat with some accuracy, the problem is how to respond. 

-- 
  Justin Kilpatrick
  justin@althea.net

On Sat, Sep 7, 2019, at 8:59 PM, Jonathan Morton wrote:
> > On 8 Sep, 2019, at 3:03 am, Justin Kilpatrick <justin@althea.net> wrote:
> > 
> > So you believe that setting the target RTT closer to the path latency was not the main contributor to reducing bloat? Is there a configuration I could use to demonstrate that one way or the other? 
> 
> The second-order effect I mentioned is related to the 'target' 
> parameter.  Checking the code, I am reminded that while Cake itself can 
> have 'target' set from userspace, there actually isn't a parameter to 
> the tc module which allows setting it independently of 'rtt'.  But 
> there *is* a table in q_cake.c (in tc) which you can temporarily extend 
> with the following entries for experimentation:
> 
> static struct cake_preset presets[] = {
> 	{"datacentre",		5,		100},
> 	{"lan",			50,		1000},
> 	{"metro",		500,		10000},
> 	{"regional",		1500,		30000},
> 	{"internet",		5000,		100000},
> 	{"oceanic",		15000,		300000},
> 	{"satellite",		50000,		1000000},
> 	{"interplanetary",	50000000,	1000000000},
> +
> +	{"metro-loose",		5000,		10000},
> +	{"internet-tight",	500,		100000},
> };
> 
> If the effect is genuinely due to marking rate, then 'metro-loose' 
> should behave like 'metro' and 'internet-tight' should behave like 
> 'internet', to a first-order approximation.  If, on the other hand, 
> it's due to the second-order interaction with CPU scheduling latency, 
> the reverse may be true.  The latter is not something you should be 
> counting on, as it will insert random AQM marking even when the link is 
> not actually saturated.
> 
> You could also set it back to 'internet' and progressively reduce the 
> bandwidth parameter, making the Cake shaper into the actual bottleneck. 
>  This is the correct fix for the problem, and you should notice an 
> instant improvement as soon as the bandwidth parameter is correct.
> 
>  - Jonathan Morton

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Cake] Fighting bloat in the face of uncertinty
  2019-09-08 14:29           ` Justin Kilpatrick
@ 2019-09-08 17:27             ` Jonathan Morton
  2019-09-16 10:21               ` [Cake] cake memory consumption Sebastian Gottschall
  2019-10-03 17:52               ` [Cake] Fighting bloat in the face of uncertinty Justin Kilpatrick
  0 siblings, 2 replies; 36+ messages in thread
From: Jonathan Morton @ 2019-09-08 17:27 UTC (permalink / raw)
  To: Justin Kilpatrick; +Cc: cake

>> You could also set it back to 'internet' and progressively reduce the 
>> bandwidth parameter, making the Cake shaper into the actual bottleneck. 
>> This is the correct fix for the problem, and you should notice an 
>> instant improvement as soon as the bandwidth parameter is correct.
> 
> Hand tuning this one link is not a problem. I'm searching for a set of settings that will provide generally good performance across a wide range of devices, links, and situations. 
> 
> From what you've indicated so far there's nothing as effective as a correct bandwidth estimation if we consider the antenna (link) a black box. Expecting the user to input expected throughput for every link and then managing that information is essentially a non-starter. 
> 
> Radio tuning provides some improvement, but until ubiquiti starts shipping with Codel on non-router devices I don't think there's a good solution here. 
> 
> Any way to have the receiving device detect bloat and insert an ECN?

That's what the qdisc itself is supposed to do.

> I don't think the time spent in the intermediate device is detectable at the kernel level but we keep track of latency for routing decisions and could detect bloat with some accuracy, the problem is how to respond.

As long as you can detect which link the bloat is on (and in which direction), you can respond by reducing the bandwidth parameter on that half-link by a small amount.  Since you have a cooperating network, maintaining a time standard on each node sufficient to observe one-way delays seems feasible, as is establishing a normal baseline latency for each link.

The characteristics of the bandwidth parameter being too high are easy to observe.  Not only will the one-way delay go up, but the received throughput in the same direction at the same time will be lower than configured.  You might use the latter as a hint as to how far you need to reduce the shaped bandwidth.

Deciding when and by how much to *increase* bandwidth, which is presumably desirable when link conditions improve, is a more difficult problem when the link hardware doesn't cooperate by informing you of its status.  (This is something you could reasonably ask Ubiquiti to address.)

I would assume that link characteristics will change slowly, and run an occasional explicit bandwidth probe to see if spare bandwidth is available.  If that probe comes through without exhibiting bloat, *and* the link is otherwise loaded to capacity, then increase the shaper by an amount within the probe's capacity of measurement - and schedule a repeat.

A suitable probe might be 100x 1500b packets paced out over a second, bypassing the shaper.  This will occupy just over 1Mbps of bandwidth, and can be expected to induce 10ms of delay if injected into a saturated 100Mbps link.  Observe the delay experienced by each packet *and* the quantity of other traffic that appears between them.  Only if both are favourable can you safely open the shaper, by 1Mbps.

Since wireless links can be expected to change their capacity over time, due to eg. weather and tree growth, this seems to be more generally useful than a static guess.  You could deploy a new link with a conservative "guess" of say 10Mbps, and just probe from there.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [Cake] cake memory consumption
  2019-09-08 17:27             ` Jonathan Morton
@ 2019-09-16 10:21               ` Sebastian Gottschall
  2019-09-16 12:00                 ` Dave Taht
  2019-09-16 12:08                 ` Toke Høiland-Jørgensen
  2019-10-03 17:52               ` [Cake] Fighting bloat in the face of uncertinty Justin Kilpatrick
  1 sibling, 2 replies; 36+ messages in thread
From: Sebastian Gottschall @ 2019-09-16 10:21 UTC (permalink / raw)
  To: cake

[-- Attachment #1: Type: text/plain, Size: 4589 bytes --]

after we found out serious out of memory issues on smaller embedded 
devices (128 mb ram) we made some benchmarks with different schedulers
with the result that cake takes a serious amount of memory. we use the 
out of tree cake module and we use it class based since we have complex 
methods of doing qos per interface, per mac addresse or even per 
ip/network. so its not just simple cake on a single interface solution. 
we made some benchmarks with different schedulers. does anybody have a 
solution for making that better?

HTB/FQ_CODEL ------- 62M
HTB/SFQ ------- 62M
HTB/PIE ------- 62M
HTB/FQ_CODEL_FAST ------- 67M
HTB/CAKE -------111M

HFSC/FQ_CODEL_FAST ------- 47M
HTB/PIE ------- 49M
HTB/SFQ ------- 50M
HFSC /FQ_CODEL ------- 52M
HFSC/CAKE -------109M


consider that the benchmark doesnt show the real values. its system 
overall and does not consider memory taken by the wireless driver for 
instance which is about 45 mb of ram for ath10k
so this makes all even more worse unfortunatly since there is not that 
many ram left for cake. just about 70mb maybe.
Am 08.09.2019 um 19:27 schrieb Jonathan Morton:
>>> You could also set it back to 'internet' and progressively reduce the
>>> bandwidth parameter, making the Cake shaper into the actual bottleneck.
>>> This is the correct fix for the problem, and you should notice an
>>> instant improvement as soon as the bandwidth parameter is correct.
>> Hand tuning this one link is not a problem. I'm searching for a set of settings that will provide generally good performance across a wide range of devices, links, and situations.
>>
>>  From what you've indicated so far there's nothing as effective as a correct bandwidth estimation if we consider the antenna (link) a black box. Expecting the user to input expected throughput for every link and then managing that information is essentially a non-starter.
>>
>> Radio tuning provides some improvement, but until ubiquiti starts shipping with Codel on non-router devices I don't think there's a good solution here.
>>
>> Any way to have the receiving device detect bloat and insert an ECN?
> That's what the qdisc itself is supposed to do.
>
>> I don't think the time spent in the intermediate device is detectable at the kernel level but we keep track of latency for routing decisions and could detect bloat with some accuracy, the problem is how to respond.
> As long as you can detect which link the bloat is on (and in which direction), you can respond by reducing the bandwidth parameter on that half-link by a small amount.  Since you have a cooperating network, maintaining a time standard on each node sufficient to observe one-way delays seems feasible, as is establishing a normal baseline latency for each link.
>
> The characteristics of the bandwidth parameter being too high are easy to observe.  Not only will the one-way delay go up, but the received throughput in the same direction at the same time will be lower than configured.  You might use the latter as a hint as to how far you need to reduce the shaped bandwidth.
>
> Deciding when and by how much to *increase* bandwidth, which is presumably desirable when link conditions improve, is a more difficult problem when the link hardware doesn't cooperate by informing you of its status.  (This is something you could reasonably ask Ubiquiti to address.)
>
> I would assume that link characteristics will change slowly, and run an occasional explicit bandwidth probe to see if spare bandwidth is available.  If that probe comes through without exhibiting bloat, *and* the link is otherwise loaded to capacity, then increase the shaper by an amount within the probe's capacity of measurement - and schedule a repeat.
>
> A suitable probe might be 100x 1500b packets paced out over a second, bypassing the shaper.  This will occupy just over 1Mbps of bandwidth, and can be expected to induce 10ms of delay if injected into a saturated 100Mbps link.  Observe the delay experienced by each packet *and* the quantity of other traffic that appears between them.  Only if both are favourable can you safely open the shaper, by 1Mbps.
>
> Since wireless links can be expected to change their capacity over time, due to eg. weather and tree growth, this seems to be more generally useful than a static guess.  You could deploy a new link with a conservative "guess" of say 10Mbps, and just probe from there.
>
>   - Jonathan Morton
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake

[-- Attachment #2: Type: text/html, Size: 6170 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Cake] cake memory consumption
  2019-09-16 10:21               ` [Cake] cake memory consumption Sebastian Gottschall
@ 2019-09-16 12:00                 ` Dave Taht
  2019-09-16 12:51                   ` Dave Taht
  2019-09-16 13:22                   ` Sebastian Gottschall
  2019-09-16 12:08                 ` Toke Høiland-Jørgensen
  1 sibling, 2 replies; 36+ messages in thread
From: Dave Taht @ 2019-09-16 12:00 UTC (permalink / raw)
  To: Sebastian Gottschall; +Cc: Cake List

I am puzzled as to why fq_codel_fast would use more ram than fq_codel
would, was sce (gso-splotting) enabled?

similarly, the differences between hfsc and htb are interesting. I
don't get that either.

How many cake instances are being created?

And for the sake of discussion, what does cake standalone consume?

On Mon, Sep 16, 2019 at 11:22 AM Sebastian Gottschall
<s.gottschall@newmedia-net.de> wrote:
>
> after we found out serious out of memory issues on smaller embedded devices (128 mb ram) we made some benchmarks with different schedulers
> with the result that cake takes a serious amount of memory. we use the out of tree cake module and we use it class based since we have complex methods of doing qos per interface, per mac addresse or even per

I note that I often thought about having mac address functionality
might be a valuable mode for cake.

>ip/network. so its not just simple cake on a single interface solution. we made some benchmarks with different schedulers. does anybody have a solution for making that better?

With such complexity required I'd stick to hfsc + fq_X rather than
layer in cake.

Understanding the model (sh -x the tc commands for, say, hfsc +
something and htb + something ) your users require, though, would be
helpful. We tried to design cake so that a jillion optimizations such
as ack prioritization, per network fq (instead per flow/per host) -
but we couldn't possibly cover all use cases in it with out more
feedback from the field.

Still... such a big difference in memory use doesn't add up. Cake has
a larger fixed memory allocation
than fq_codel, but the rest is just packets which come from global memory.

Can you point to a build and a couple targets we could try? I am
presently travelling (in portugal) and won't
be back online until later this week.
>
> HTB/FQ_CODEL ------- 62M
> HTB/SFQ ------- 62M
> HTB/PIE ------- 62M
> HTB/FQ_CODEL_FAST ------- 67M
> HTB/CAKE -------111M
>
> HFSC/FQ_CODEL_FAST ------- 47M
> HTB/PIE ------- 49M
> HTB/SFQ ------- 50M
> HFSC /FQ_CODEL ------- 52M
> HFSC/CAKE -------109M
>
>
> consider that the benchmark doesnt show the real values. its system overall and does not consider memory taken by the wireless driver for instance which is about 45 mb of ram for ath10k
> so this makes all even more worse unfortunatly since there is not that many ram left for cake. just about 70mb maybe.
> Am 08.09.2019 um 19:27 schrieb Jonathan Morton:
>
> You could also set it back to 'internet' and progressively reduce the
> bandwidth parameter, making the Cake shaper into the actual bottleneck.
> This is the correct fix for the problem, and you should notice an
> instant improvement as soon as the bandwidth parameter is correct.
>
> Hand tuning this one link is not a problem. I'm searching for a set of settings that will provide generally good performance across a wide range of devices, links, and situations.
>
> From what you've indicated so far there's nothing as effective as a correct bandwidth estimation if we consider the antenna (link) a black box. Expecting the user to input expected throughput for every link and then managing that information is essentially a non-starter.
>
> Radio tuning provides some improvement, but until ubiquiti starts shipping with Codel on non-router devices I don't think there's a good solution here.
>
> Any way to have the receiving device detect bloat and insert an ECN?
>
> That's what the qdisc itself is supposed to do.
>
> I don't think the time spent in the intermediate device is detectable at the kernel level but we keep track of latency for routing decisions and could detect bloat with some accuracy, the problem is how to respond.
>
> As long as you can detect which link the bloat is on (and in which direction), you can respond by reducing the bandwidth parameter on that half-link by a small amount.  Since you have a cooperating network, maintaining a time standard on each node sufficient to observe one-way delays seems feasible, as is establishing a normal baseline latency for each link.
>
> The characteristics of the bandwidth parameter being too high are easy to observe.  Not only will the one-way delay go up, but the received throughput in the same direction at the same time will be lower than configured.  You might use the latter as a hint as to how far you need to reduce the shaped bandwidth.
>
> Deciding when and by how much to *increase* bandwidth, which is presumably desirable when link conditions improve, is a more difficult problem when the link hardware doesn't cooperate by informing you of its status.  (This is something you could reasonably ask Ubiquiti to address.)
>
> I would assume that link characteristics will change slowly, and run an occasional explicit bandwidth probe to see if spare bandwidth is available.  If that probe comes through without exhibiting bloat, *and* the link is otherwise loaded to capacity, then increase the shaper by an amount within the probe's capacity of measurement - and schedule a repeat.
>
> A suitable probe might be 100x 1500b packets paced out over a second, bypassing the shaper.  This will occupy just over 1Mbps of bandwidth, and can be expected to induce 10ms of delay if injected into a saturated 100Mbps link.  Observe the delay experienced by each packet *and* the quantity of other traffic that appears between them.  Only if both are favourable can you safely open the shaper, by 1Mbps.
>
> Since wireless links can be expected to change their capacity over time, due to eg. weather and tree growth, this seems to be more generally useful than a static guess.  You could deploy a new link with a conservative "guess" of say 10Mbps, and just probe from there.
>
>  - Jonathan Morton
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake
>
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake



-- 

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Cake] cake memory consumption
  2019-09-16 10:21               ` [Cake] cake memory consumption Sebastian Gottschall
  2019-09-16 12:00                 ` Dave Taht
@ 2019-09-16 12:08                 ` Toke Høiland-Jørgensen
  2019-09-16 13:25                   ` Sebastian Gottschall
  1 sibling, 1 reply; 36+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-09-16 12:08 UTC (permalink / raw)
  To: Sebastian Gottschall, cake

Sebastian Gottschall <s.gottschall@newmedia-net.de> writes:

> after we found out serious out of memory issues on smaller embedded 
> devices (128 mb ram) we made some benchmarks with different schedulers
> with the result that cake takes a serious amount of memory. we use the 
> out of tree cake module and we use it class based since we have complex 
> methods of doing qos per interface, per mac addresse or even per 
> ip/network. so its not just simple cake on a single interface solution. 
> we made some benchmarks with different schedulers. does anybody have a 
> solution for making that better?
>
> HTB/FQ_CODEL ------- 62M
> HTB/SFQ ------- 62M
> HTB/PIE ------- 62M
> HTB/FQ_CODEL_FAST ------- 67M
> HTB/CAKE -------111M
>
> HFSC/FQ_CODEL_FAST ------- 47M
> HTB/PIE ------- 49M
> HTB/SFQ ------- 50M
> HFSC /FQ_CODEL ------- 52M
> HFSC/CAKE -------109M

How are you measuring the memory usage, and what is your full config for
each setup? :)

-Toke


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Cake] cake memory consumption
  2019-09-16 12:00                 ` Dave Taht
@ 2019-09-16 12:51                   ` Dave Taht
  2019-09-16 13:31                     ` Sebastian Gottschall
  2019-09-16 13:22                   ` Sebastian Gottschall
  1 sibling, 1 reply; 36+ messages in thread
From: Dave Taht @ 2019-09-16 12:51 UTC (permalink / raw)
  To: Sebastian Gottschall; +Cc: Cake List

Perhaps the differences in memory use are a memory leak of some kind?
If you could run the same number of packets through each configuration
and look at memory use, that might point somewhere.

cake - with gso-splitting - should fragment memory more than the other
alternatives, as will fq_codel_fast with sce enabled.

On Mon, Sep 16, 2019 at 1:00 PM Dave Taht <dave.taht@gmail.com> wrote:
>
> I am puzzled as to why fq_codel_fast would use more ram than fq_codel
> would, was sce (gso-splotting) enabled?
>
> similarly, the differences between hfsc and htb are interesting. I
> don't get that either.
>
> How many cake instances are being created?
>
> And for the sake of discussion, what does cake standalone consume?
>
> On Mon, Sep 16, 2019 at 11:22 AM Sebastian Gottschall
> <s.gottschall@newmedia-net.de> wrote:
> >
> > after we found out serious out of memory issues on smaller embedded devices (128 mb ram) we made some benchmarks with different schedulers
> > with the result that cake takes a serious amount of memory. we use the out of tree cake module and we use it class based since we have complex methods of doing qos per interface, per mac addresse or even per
>
> I note that I often thought about having mac address functionality
> might be a valuable mode for cake.
>
> >ip/network. so its not just simple cake on a single interface solution. we made some benchmarks with different schedulers. does anybody have a solution for making that better?
>
> With such complexity required I'd stick to hfsc + fq_X rather than
> layer in cake.
>
> Understanding the model (sh -x the tc commands for, say, hfsc +
> something and htb + something ) your users require, though, would be
> helpful. We tried to design cake so that a jillion optimizations such
> as ack prioritization, per network fq (instead per flow/per host) -
> but we couldn't possibly cover all use cases in it with out more
> feedback from the field.
>
> Still... such a big difference in memory use doesn't add up. Cake has
> a larger fixed memory allocation
> than fq_codel, but the rest is just packets which come from global memory.
>
> Can you point to a build and a couple targets we could try? I am
> presently travelling (in portugal) and won't
> be back online until later this week.
> >
> > HTB/FQ_CODEL ------- 62M
> > HTB/SFQ ------- 62M
> > HTB/PIE ------- 62M
> > HTB/FQ_CODEL_FAST ------- 67M
> > HTB/CAKE -------111M
> >
> > HFSC/FQ_CODEL_FAST ------- 47M
> > HTB/PIE ------- 49M
> > HTB/SFQ ------- 50M
> > HFSC /FQ_CODEL ------- 52M
> > HFSC/CAKE -------109M
> >
> >
> > consider that the benchmark doesnt show the real values. its system overall and does not consider memory taken by the wireless driver for instance which is about 45 mb of ram for ath10k
> > so this makes all even more worse unfortunatly since there is not that many ram left for cake. just about 70mb maybe.
> > Am 08.09.2019 um 19:27 schrieb Jonathan Morton:
> >
> > You could also set it back to 'internet' and progressively reduce the
> > bandwidth parameter, making the Cake shaper into the actual bottleneck.
> > This is the correct fix for the problem, and you should notice an
> > instant improvement as soon as the bandwidth parameter is correct.
> >
> > Hand tuning this one link is not a problem. I'm searching for a set of settings that will provide generally good performance across a wide range of devices, links, and situations.
> >
> > From what you've indicated so far there's nothing as effective as a correct bandwidth estimation if we consider the antenna (link) a black box. Expecting the user to input expected throughput for every link and then managing that information is essentially a non-starter.
> >
> > Radio tuning provides some improvement, but until ubiquiti starts shipping with Codel on non-router devices I don't think there's a good solution here.
> >
> > Any way to have the receiving device detect bloat and insert an ECN?
> >
> > That's what the qdisc itself is supposed to do.
> >
> > I don't think the time spent in the intermediate device is detectable at the kernel level but we keep track of latency for routing decisions and could detect bloat with some accuracy, the problem is how to respond.
> >
> > As long as you can detect which link the bloat is on (and in which direction), you can respond by reducing the bandwidth parameter on that half-link by a small amount.  Since you have a cooperating network, maintaining a time standard on each node sufficient to observe one-way delays seems feasible, as is establishing a normal baseline latency for each link.
> >
> > The characteristics of the bandwidth parameter being too high are easy to observe.  Not only will the one-way delay go up, but the received throughput in the same direction at the same time will be lower than configured.  You might use the latter as a hint as to how far you need to reduce the shaped bandwidth.
> >
> > Deciding when and by how much to *increase* bandwidth, which is presumably desirable when link conditions improve, is a more difficult problem when the link hardware doesn't cooperate by informing you of its status.  (This is something you could reasonably ask Ubiquiti to address.)
> >
> > I would assume that link characteristics will change slowly, and run an occasional explicit bandwidth probe to see if spare bandwidth is available.  If that probe comes through without exhibiting bloat, *and* the link is otherwise loaded to capacity, then increase the shaper by an amount within the probe's capacity of measurement - and schedule a repeat.
> >
> > A suitable probe might be 100x 1500b packets paced out over a second, bypassing the shaper.  This will occupy just over 1Mbps of bandwidth, and can be expected to induce 10ms of delay if injected into a saturated 100Mbps link.  Observe the delay experienced by each packet *and* the quantity of other traffic that appears between them.  Only if both are favourable can you safely open the shaper, by 1Mbps.
> >
> > Since wireless links can be expected to change their capacity over time, due to eg. weather and tree growth, this seems to be more generally useful than a static guess.  You could deploy a new link with a conservative "guess" of say 10Mbps, and just probe from there.
> >
> >  - Jonathan Morton
> > _______________________________________________
> > Cake mailing list
> > Cake@lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/cake
> >
> > _______________________________________________
> > Cake mailing list
> > Cake@lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/cake
>
>
>
> --
>
> Dave Täht
> CTO, TekLibre, LLC
> http://www.teklibre.com
> Tel: 1-831-205-9740



-- 

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Cake] cake memory consumption
  2019-09-16 12:00                 ` Dave Taht
  2019-09-16 12:51                   ` Dave Taht
@ 2019-09-16 13:22                   ` Sebastian Gottschall
  2019-09-16 13:28                     ` Justin Kilpatrick
  1 sibling, 1 reply; 36+ messages in thread
From: Sebastian Gottschall @ 2019-09-16 13:22 UTC (permalink / raw)
  To: Dave Taht; +Cc: Cake List


Am 16.09.2019 um 14:00 schrieb Dave Taht:
> I am puzzled as to why fq_codel_fast would use more ram than fq_codel
> would, was sce (gso-splotting) enabled?
that can by typical error tollerance. he just used "free" for comparisation
>
> similarly, the differences between hfsc and htb are interesting. I
> don't get that either.
>
> How many cake instances are being created?
according to his config, i assume 7
>
> And for the sake of discussion, what does cake standalone consume?
thats a rare condition for my testers. this is something for PC's but 
not for routers :-)
this is something i need to find out for myself on my routers
>
> On Mon, Sep 16, 2019 at 11:22 AM Sebastian Gottschall
> <s.gottschall@newmedia-net.de> wrote:
>> after we found out serious out of memory issues on smaller embedded devices (128 mb ram) we made some benchmarks with different schedulers
>> with the result that cake takes a serious amount of memory. we use the out of tree cake module and we use it class based since we have complex methods of doing qos per interface, per mac addresse or even per
> I note that I often thought about having mac address functionality
> might be a valuable mode for cake.
that wouldnt help. there are many variations with multiple different 
settings for different mac addresses. as far as i have seen cake is not 
designed to work like this. this is why we
have to use a class / qdisc tree in my case
>
>> ip/network. so its not just simple cake on a single interface solution. we made some benchmarks with different schedulers. does anybody have a solution for making that better?
> With such complexity required I'd stick to hfsc + fq_X rather than
> layer in cake.
yea. i told that too. but people complain that cake runs soooooooo much 
better. or at least a little bit. hard to get around this argument
>
> Understanding the model (sh -x the tc commands for, say, hfsc +
> something and htb + something ) your users require, though, would be
> helpful. We tried to design cake so that a jillion optimizations such
> as ack prioritization, per network fq (instead per flow/per host) -
> but we couldn't possibly cover all use cases in it with out more
> feedback from the field.
>
> Still... such a big difference in memory use doesn't add up. Cake has
> a larger fixed memory allocation
4 mb max as i have seen. but by 7 its coming up to 28. but i still see 
much more here. consider that i implemented the same limitation to 
fq_codel and also fq_codel_fast
(model specific. on bigger devices i dont restrict he memory to 4 mb)
> than fq_codel, but the rest is just packets which come from global memory.
>
> Can you point to a build and a couple targets we could try? I am
> presently travelling (in portugal) and won't
> be back online until later this week.
what do you mean with targets? the build for testing was always the 
same. i requested todo the test just with multiple schedulers which is 
switchable in my gui.

what i can do is doing a tree like print to visualize how its builded 
(or i simple print you out the qdisc/class/filters)

the test itself was made on a tplink archer c7 v2.

>> HTB/FQ_CODEL ------- 62M
>> HTB/SFQ ------- 62M
>> HTB/PIE ------- 62M
>> HTB/FQ_CODEL_FAST ------- 67M
>> HTB/CAKE -------111M
>>
>> HFSC/FQ_CODEL_FAST ------- 47M
>> HTB/PIE ------- 49M
>> HTB/SFQ ------- 50M
>> HFSC /FQ_CODEL ------- 52M
>> HFSC/CAKE -------109M
>>
>>
>> consider that the benchmark doesnt show the real values. its system overall and does not consider memory taken by the wireless driver for instance which is about 45 mb of ram for ath10k
>> so this makes all even more worse unfortunatly since there is not that many ram left for cake. just about 70mb maybe.
>> Am 08.09.2019 um 19:27 schrieb Jonathan Morton:
>>
>> You could also set it back to 'internet' and progressively reduce the
>> bandwidth parameter, making the Cake shaper into the actual bottleneck.
>> This is the correct fix for the problem, and you should notice an
>> instant improvement as soon as the bandwidth parameter is correct.
>>
>> Hand tuning this one link is not a problem. I'm searching for a set of settings that will provide generally good performance across a wide range of devices, links, and situations.
>>
>>  From what you've indicated so far there's nothing as effective as a correct bandwidth estimation if we consider the antenna (link) a black box. Expecting the user to input expected throughput for every link and then managing that information is essentially a non-starter.
>>
>> Radio tuning provides some improvement, but until ubiquiti starts shipping with Codel on non-router devices I don't think there's a good solution here.
>>
>> Any way to have the receiving device detect bloat and insert an ECN?
>>
>> That's what the qdisc itself is supposed to do.
>>
>> I don't think the time spent in the intermediate device is detectable at the kernel level but we keep track of latency for routing decisions and could detect bloat with some accuracy, the problem is how to respond.
>>
>> As long as you can detect which link the bloat is on (and in which direction), you can respond by reducing the bandwidth parameter on that half-link by a small amount.  Since you have a cooperating network, maintaining a time standard on each node sufficient to observe one-way delays seems feasible, as is establishing a normal baseline latency for each link.
>>
>> The characteristics of the bandwidth parameter being too high are easy to observe.  Not only will the one-way delay go up, but the received throughput in the same direction at the same time will be lower than configured.  You might use the latter as a hint as to how far you need to reduce the shaped bandwidth.
>>
>> Deciding when and by how much to *increase* bandwidth, which is presumably desirable when link conditions improve, is a more difficult problem when the link hardware doesn't cooperate by informing you of its status.  (This is something you could reasonably ask Ubiquiti to address.)
>>
>> I would assume that link characteristics will change slowly, and run an occasional explicit bandwidth probe to see if spare bandwidth is available.  If that probe comes through without exhibiting bloat, *and* the link is otherwise loaded to capacity, then increase the shaper by an amount within the probe's capacity of measurement - and schedule a repeat.
>>
>> A suitable probe might be 100x 1500b packets paced out over a second, bypassing the shaper.  This will occupy just over 1Mbps of bandwidth, and can be expected to induce 10ms of delay if injected into a saturated 100Mbps link.  Observe the delay experienced by each packet *and* the quantity of other traffic that appears between them.  Only if both are favourable can you safely open the shaper, by 1Mbps.
>>
>> Since wireless links can be expected to change their capacity over time, due to eg. weather and tree growth, this seems to be more generally useful than a static guess.  You could deploy a new link with a conservative "guess" of say 10Mbps, and just probe from there.
>>
>>   - Jonathan Morton
>> _______________________________________________
>> Cake mailing list
>> Cake@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cake
>>
>> _______________________________________________
>> Cake mailing list
>> Cake@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cake
>
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Cake] cake memory consumption
  2019-09-16 12:08                 ` Toke Høiland-Jørgensen
@ 2019-09-16 13:25                   ` Sebastian Gottschall
  2019-09-16 14:01                     ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 36+ messages in thread
From: Sebastian Gottschall @ 2019-09-16 13:25 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, cake


Am 16.09.2019 um 14:08 schrieb Toke Høiland-Jørgensen:
> Sebastian Gottschall <s.gottschall@newmedia-net.de> writes:
>
>> after we found out serious out of memory issues on smaller embedded
>> devices (128 mb ram) we made some benchmarks with different schedulers
>> with the result that cake takes a serious amount of memory. we use the
>> out of tree cake module and we use it class based since we have complex
>> methods of doing qos per interface, per mac addresse or even per
>> ip/network. so its not just simple cake on a single interface solution.
>> we made some benchmarks with different schedulers. does anybody have a
>> solution for making that better?
>>
>> HTB/FQ_CODEL ------- 62M
>> HTB/SFQ ------- 62M
>> HTB/PIE ------- 62M
>> HTB/FQ_CODEL_FAST ------- 67M
>> HTB/CAKE -------111M
>>
>> HFSC/FQ_CODEL_FAST ------- 47M
>> HTB/PIE ------- 49M
>> HTB/SFQ ------- 50M
>> HFSC /FQ_CODEL ------- 52M
>> HFSC/CAKE -------109M
> How are you measuring the memory usage, and what is your full config for
> each setup? :)
me? nothing. i requested this test from a reporter and he uses just free 
/ top. so there is a error tollerance.

but it shows a significant difference between cake and fq_codel etc. 
cake is doing a OOM at the end

for the full report including config screenshots see this 
https://svn.dd-wrt.com/ticket/6798#comment:14. it shows also the qos 
setup which i can use to reproduce and to
print out the full tc ruleset if required (which it surelly is for you). 
if you want i will recreate this setup and send the tc rules on this list

>
> -Toke
>
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Cake] cake memory consumption
  2019-09-16 13:22                   ` Sebastian Gottschall
@ 2019-09-16 13:28                     ` Justin Kilpatrick
  2019-09-16 13:39                       ` Jonathan Morton
  2019-09-16 13:47                       ` Sebastian Gottschall
  0 siblings, 2 replies; 36+ messages in thread
From: Justin Kilpatrick @ 2019-09-16 13:28 UTC (permalink / raw)
  To: cake

I'm not seeing anything like the memory usage you describe in a similar situation. 

OpenWRT 18.06.4 on a glb1300 and 10+ virtual interfaces with cake. Total memory usage is 70MB for everything. 

-- 
  Justin Kilpatrick
  justin@althea.net

On Mon, Sep 16, 2019, at 9:22 AM, Sebastian Gottschall wrote:
> 
> Am 16.09.2019 um 14:00 schrieb Dave Taht:
> > I am puzzled as to why fq_codel_fast would use more ram than fq_codel
> > would, was sce (gso-splotting) enabled?
> that can by typical error tollerance. he just used "free" for comparisation
> >
> > similarly, the differences between hfsc and htb are interesting. I
> > don't get that either.
> >
> > How many cake instances are being created?
> according to his config, i assume 7
> >
> > And for the sake of discussion, what does cake standalone consume?
> thats a rare condition for my testers. this is something for PC's but 
> not for routers :-)
> this is something i need to find out for myself on my routers
> >
> > On Mon, Sep 16, 2019 at 11:22 AM Sebastian Gottschall
> > <s.gottschall@newmedia-net.de> wrote:
> >> after we found out serious out of memory issues on smaller embedded devices (128 mb ram) we made some benchmarks with different schedulers
> >> with the result that cake takes a serious amount of memory. we use the out of tree cake module and we use it class based since we have complex methods of doing qos per interface, per mac addresse or even per
> > I note that I often thought about having mac address functionality
> > might be a valuable mode for cake.
> that wouldnt help. there are many variations with multiple different 
> settings for different mac addresses. as far as i have seen cake is not 
> designed to work like this. this is why we
> have to use a class / qdisc tree in my case
> >
> >> ip/network. so its not just simple cake on a single interface solution. we made some benchmarks with different schedulers. does anybody have a solution for making that better?
> > With such complexity required I'd stick to hfsc + fq_X rather than
> > layer in cake.
> yea. i told that too. but people complain that cake runs soooooooo much 
> better. or at least a little bit. hard to get around this argument
> >
> > Understanding the model (sh -x the tc commands for, say, hfsc +
> > something and htb + something ) your users require, though, would be
> > helpful. We tried to design cake so that a jillion optimizations such
> > as ack prioritization, per network fq (instead per flow/per host) -
> > but we couldn't possibly cover all use cases in it with out more
> > feedback from the field.
> >
> > Still... such a big difference in memory use doesn't add up. Cake has
> > a larger fixed memory allocation
> 4 mb max as i have seen. but by 7 its coming up to 28. but i still see 
> much more here. consider that i implemented the same limitation to 
> fq_codel and also fq_codel_fast
> (model specific. on bigger devices i dont restrict he memory to 4 mb)
> > than fq_codel, but the rest is just packets which come from global memory.
> >
> > Can you point to a build and a couple targets we could try? I am
> > presently travelling (in portugal) and won't
> > be back online until later this week.
> what do you mean with targets? the build for testing was always the 
> same. i requested todo the test just with multiple schedulers which is 
> switchable in my gui.
> 
> what i can do is doing a tree like print to visualize how its builded 
> (or i simple print you out the qdisc/class/filters)
> 
> the test itself was made on a tplink archer c7 v2.
> 
> >> HTB/FQ_CODEL ------- 62M
> >> HTB/SFQ ------- 62M
> >> HTB/PIE ------- 62M
> >> HTB/FQ_CODEL_FAST ------- 67M
> >> HTB/CAKE -------111M
> >>
> >> HFSC/FQ_CODEL_FAST ------- 47M
> >> HTB/PIE ------- 49M
> >> HTB/SFQ ------- 50M
> >> HFSC /FQ_CODEL ------- 52M
> >> HFSC/CAKE -------109M
> >>
> >>
> >> consider that the benchmark doesnt show the real values. its system overall and does not consider memory taken by the wireless driver for instance which is about 45 mb of ram for ath10k
> >> so this makes all even more worse unfortunatly since there is not that many ram left for cake. just about 70mb maybe.
> >> Am 08.09.2019 um 19:27 schrieb Jonathan Morton:
> >>
> >> You could also set it back to 'internet' and progressively reduce the
> >> bandwidth parameter, making the Cake shaper into the actual bottleneck.
> >> This is the correct fix for the problem, and you should notice an
> >> instant improvement as soon as the bandwidth parameter is correct.
> >>
> >> Hand tuning this one link is not a problem. I'm searching for a set of settings that will provide generally good performance across a wide range of devices, links, and situations.
> >>
> >>  From what you've indicated so far there's nothing as effective as a correct bandwidth estimation if we consider the antenna (link) a black box. Expecting the user to input expected throughput for every link and then managing that information is essentially a non-starter.
> >>
> >> Radio tuning provides some improvement, but until ubiquiti starts shipping with Codel on non-router devices I don't think there's a good solution here.
> >>
> >> Any way to have the receiving device detect bloat and insert an ECN?
> >>
> >> That's what the qdisc itself is supposed to do.
> >>
> >> I don't think the time spent in the intermediate device is detectable at the kernel level but we keep track of latency for routing decisions and could detect bloat with some accuracy, the problem is how to respond.
> >>
> >> As long as you can detect which link the bloat is on (and in which direction), you can respond by reducing the bandwidth parameter on that half-link by a small amount.  Since you have a cooperating network, maintaining a time standard on each node sufficient to observe one-way delays seems feasible, as is establishing a normal baseline latency for each link.
> >>
> >> The characteristics of the bandwidth parameter being too high are easy to observe.  Not only will the one-way delay go up, but the received throughput in the same direction at the same time will be lower than configured.  You might use the latter as a hint as to how far you need to reduce the shaped bandwidth.
> >>
> >> Deciding when and by how much to *increase* bandwidth, which is presumably desirable when link conditions improve, is a more difficult problem when the link hardware doesn't cooperate by informing you of its status.  (This is something you could reasonably ask Ubiquiti to address.)
> >>
> >> I would assume that link characteristics will change slowly, and run an occasional explicit bandwidth probe to see if spare bandwidth is available.  If that probe comes through without exhibiting bloat, *and* the link is otherwise loaded to capacity, then increase the shaper by an amount within the probe's capacity of measurement - and schedule a repeat.
> >>
> >> A suitable probe might be 100x 1500b packets paced out over a second, bypassing the shaper.  This will occupy just over 1Mbps of bandwidth, and can be expected to induce 10ms of delay if injected into a saturated 100Mbps link.  Observe the delay experienced by each packet *and* the quantity of other traffic that appears between them.  Only if both are favourable can you safely open the shaper, by 1Mbps.
> >>
> >> Since wireless links can be expected to change their capacity over time, due to eg. weather and tree growth, this seems to be more generally useful than a static guess.  You could deploy a new link with a conservative "guess" of say 10Mbps, and just probe from there.
> >>
> >>   - Jonathan Morton
> >> _______________________________________________
> >> Cake mailing list
> >> Cake@lists.bufferbloat.net
> >> https://lists.bufferbloat.net/listinfo/cake
> >>
> >> _______________________________________________
> >> Cake mailing list
> >> Cake@lists.bufferbloat.net
> >> https://lists.bufferbloat.net/listinfo/cake
> >
> >
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Cake] cake memory consumption
  2019-09-16 12:51                   ` Dave Taht
@ 2019-09-16 13:31                     ` Sebastian Gottschall
  0 siblings, 0 replies; 36+ messages in thread
From: Sebastian Gottschall @ 2019-09-16 13:31 UTC (permalink / raw)
  To: Dave Taht; +Cc: Cake List


Am 16.09.2019 um 14:51 schrieb Dave Taht:
> Perhaps the differences in memory use are a memory leak of some kind?
then it would be a leak in cake :-)
> If you could run the same number of packets through each configuration
> and look at memory use, that might point somewhere.
>
> cake - with gso-splitting - should fragment memory more than the other
> alternatives, as will fq_codel_fast with sce enabled.

all is used without mods. only the fq_codel_fast module is restricted to 
use 4 mb max only on this specific model. like cake too. fq_codel_fast 
is used as drop in replacement
for fq_codel if selected. so ce_threshold is not used nor is split_gso 
used for cake


>
> On Mon, Sep 16, 2019 at 1:00 PM Dave Taht <dave.taht@gmail.com> wrote:
>> I am puzzled as to why fq_codel_fast would use more ram than fq_codel
>> would, was sce (gso-splotting) enabled?
>>
>> similarly, the differences between hfsc and htb are interesting. I
>> don't get that either.
>>
>> How many cake instances are being created?
>>
>> And for the sake of discussion, what does cake standalone consume?
>>
>> On Mon, Sep 16, 2019 at 11:22 AM Sebastian Gottschall
>> <s.gottschall@newmedia-net.de> wrote:
>>> after we found out serious out of memory issues on smaller embedded devices (128 mb ram) we made some benchmarks with different schedulers
>>> with the result that cake takes a serious amount of memory. we use the out of tree cake module and we use it class based since we have complex methods of doing qos per interface, per mac addresse or even per
>> I note that I often thought about having mac address functionality
>> might be a valuable mode for cake.
>>
>>> ip/network. so its not just simple cake on a single interface solution. we made some benchmarks with different schedulers. does anybody have a solution for making that better?
>> With such complexity required I'd stick to hfsc + fq_X rather than
>> layer in cake.
>>
>> Understanding the model (sh -x the tc commands for, say, hfsc +
>> something and htb + something ) your users require, though, would be
>> helpful. We tried to design cake so that a jillion optimizations such
>> as ack prioritization, per network fq (instead per flow/per host) -
>> but we couldn't possibly cover all use cases in it with out more
>> feedback from the field.
>>
>> Still... such a big difference in memory use doesn't add up. Cake has
>> a larger fixed memory allocation
>> than fq_codel, but the rest is just packets which come from global memory.
>>
>> Can you point to a build and a couple targets we could try? I am
>> presently travelling (in portugal) and won't
>> be back online until later this week.
>>> HTB/FQ_CODEL ------- 62M
>>> HTB/SFQ ------- 62M
>>> HTB/PIE ------- 62M
>>> HTB/FQ_CODEL_FAST ------- 67M
>>> HTB/CAKE -------111M
>>>
>>> HFSC/FQ_CODEL_FAST ------- 47M
>>> HTB/PIE ------- 49M
>>> HTB/SFQ ------- 50M
>>> HFSC /FQ_CODEL ------- 52M
>>> HFSC/CAKE -------109M
>>>
>>>
>>> consider that the benchmark doesnt show the real values. its system overall and does not consider memory taken by the wireless driver for instance which is about 45 mb of ram for ath10k
>>> so this makes all even more worse unfortunatly since there is not that many ram left for cake. just about 70mb maybe.
>>> Am 08.09.2019 um 19:27 schrieb Jonathan Morton:
>>>
>>> You could also set it back to 'internet' and progressively reduce the
>>> bandwidth parameter, making the Cake shaper into the actual bottleneck.
>>> This is the correct fix for the problem, and you should notice an
>>> instant improvement as soon as the bandwidth parameter is correct.
>>>
>>> Hand tuning this one link is not a problem. I'm searching for a set of settings that will provide generally good performance across a wide range of devices, links, and situations.
>>>
>>>  From what you've indicated so far there's nothing as effective as a correct bandwidth estimation if we consider the antenna (link) a black box. Expecting the user to input expected throughput for every link and then managing that information is essentially a non-starter.
>>>
>>> Radio tuning provides some improvement, but until ubiquiti starts shipping with Codel on non-router devices I don't think there's a good solution here.
>>>
>>> Any way to have the receiving device detect bloat and insert an ECN?
>>>
>>> That's what the qdisc itself is supposed to do.
>>>
>>> I don't think the time spent in the intermediate device is detectable at the kernel level but we keep track of latency for routing decisions and could detect bloat with some accuracy, the problem is how to respond.
>>>
>>> As long as you can detect which link the bloat is on (and in which direction), you can respond by reducing the bandwidth parameter on that half-link by a small amount.  Since you have a cooperating network, maintaining a time standard on each node sufficient to observe one-way delays seems feasible, as is establishing a normal baseline latency for each link.
>>>
>>> The characteristics of the bandwidth parameter being too high are easy to observe.  Not only will the one-way delay go up, but the received throughput in the same direction at the same time will be lower than configured.  You might use the latter as a hint as to how far you need to reduce the shaped bandwidth.
>>>
>>> Deciding when and by how much to *increase* bandwidth, which is presumably desirable when link conditions improve, is a more difficult problem when the link hardware doesn't cooperate by informing you of its status.  (This is something you could reasonably ask Ubiquiti to address.)
>>>
>>> I would assume that link characteristics will change slowly, and run an occasional explicit bandwidth probe to see if spare bandwidth is available.  If that probe comes through without exhibiting bloat, *and* the link is otherwise loaded to capacity, then increase the shaper by an amount within the probe's capacity of measurement - and schedule a repeat.
>>>
>>> A suitable probe might be 100x 1500b packets paced out over a second, bypassing the shaper.  This will occupy just over 1Mbps of bandwidth, and can be expected to induce 10ms of delay if injected into a saturated 100Mbps link.  Observe the delay experienced by each packet *and* the quantity of other traffic that appears between them.  Only if both are favourable can you safely open the shaper, by 1Mbps.
>>>
>>> Since wireless links can be expected to change their capacity over time, due to eg. weather and tree growth, this seems to be more generally useful than a static guess.  You could deploy a new link with a conservative "guess" of say 10Mbps, and just probe from there.
>>>
>>>   - Jonathan Morton
>>> _______________________________________________
>>> Cake mailing list
>>> Cake@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/cake
>>>
>>> _______________________________________________
>>> Cake mailing list
>>> Cake@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/cake
>>
>>
>> --
>>
>> Dave Täht
>> CTO, TekLibre, LLC
>> http://www.teklibre.com
>> Tel: 1-831-205-9740
>
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Cake] cake memory consumption
  2019-09-16 13:28                     ` Justin Kilpatrick
@ 2019-09-16 13:39                       ` Jonathan Morton
  2019-09-16 13:54                         ` Sebastian Gottschall
  2019-09-16 13:47                       ` Sebastian Gottschall
  1 sibling, 1 reply; 36+ messages in thread
From: Jonathan Morton @ 2019-09-16 13:39 UTC (permalink / raw)
  To: Justin Kilpatrick; +Cc: cake

> On 16 Sep, 2019, at 4:28 pm, Justin Kilpatrick <justin@althea.net> wrote:
> 
> OpenWRT 18.06.4 on a glb1300 and 10+ virtual interfaces with cake. Total memory usage is 70MB for everything. 

My IQrouter, which is Archer C7 hardware, is presently running with 73MB free out of 128MB, after nearly 43 days uptime with heavy usage.  It has at least two Cake instances running, on a recent kernel.

I see from the forum logs that kernel 3.18.x is in use there.  That's very old indeed, and I believe there were some fairly big differences in packet memory management since then.  It would be entirely possible for some memory management bug to be introduced by a vendor patch, for example.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Cake] cake memory consumption
  2019-09-16 13:28                     ` Justin Kilpatrick
  2019-09-16 13:39                       ` Jonathan Morton
@ 2019-09-16 13:47                       ` Sebastian Gottschall
  1 sibling, 0 replies; 36+ messages in thread
From: Sebastian Gottschall @ 2019-09-16 13:47 UTC (permalink / raw)
  To: cake


Am 16.09.2019 um 15:28 schrieb Justin Kilpatrick:
> I'm not seeing anything like the memory usage you describe in a similar situation.
>
> OpenWRT 18.06.4 on a glb1300 and 10+ virtual interfaces with cake. Total memory usage is 70MB for everything.
doesnt sound much different. consider the archer c7 has a wireless 
ath10k based card and ath10k alone take 40 - 45 mb io memory from the 
system itself.
Now 70 + 45 = 115 mb + some kernel memory and userspace crap . = OOM on 
a 128 mb device
i dont know if you enabled wireless on the glb1300 which uses ath10k too 
and i dont know you you are running separate cake instances on each of 
these interfaces
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Cake] cake memory consumption
  2019-09-16 13:39                       ` Jonathan Morton
@ 2019-09-16 13:54                         ` Sebastian Gottschall
  2019-09-16 14:06                           ` Jonathan Morton
  0 siblings, 1 reply; 36+ messages in thread
From: Sebastian Gottschall @ 2019-09-16 13:54 UTC (permalink / raw)
  To: cake


Am 16.09.2019 um 15:39 schrieb Jonathan Morton:
>> On 16 Sep, 2019, at 4:28 pm, Justin Kilpatrick <justin@althea.net> wrote:
>>
>> OpenWRT 18.06.4 on a glb1300 and 10+ virtual interfaces with cake. Total memory usage is 70MB for everything.
> My IQrouter, which is Archer C7 hardware, is presently running with 73MB free out of 128MB, after nearly 43 days uptime with heavy usage.  It has at least two Cake instances running, on a recent kernel.
>
> I see from the forum logs that kernel 3.18.x is in use there.  That's very old indeed, and I believe there were some fairly big differences in packet memory management since then.  It would be entirely possible for some memory management bug to be introduced by a vendor patch, for example.
i dont use vendor patches. its a old kernel i know and i have some 
backports on it. i avoided switching to newer kernels due some serious 
issues under specific conditions on these models (unrelated to qos, but 
to flash memory access). the drivers are basicly the same as for openwrt 
and it runes fairly well for all schedulers except for cake here. 
usually there is also a big in the out of tree version of cake. but i 
havent found anything while reviewing


>
>   - Jonathan Morton
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Cake] cake memory consumption
  2019-09-16 13:25                   ` Sebastian Gottschall
@ 2019-09-16 14:01                     ` Toke Høiland-Jørgensen
  2019-09-17  5:06                       ` Sebastian Gottschall
                                         ` (3 more replies)
  0 siblings, 4 replies; 36+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-09-16 14:01 UTC (permalink / raw)
  To: Sebastian Gottschall, cake

Sebastian Gottschall <s.gottschall@newmedia-net.de> writes:

> Am 16.09.2019 um 14:08 schrieb Toke Høiland-Jørgensen:
>> Sebastian Gottschall <s.gottschall@newmedia-net.de> writes:
>>
>>> after we found out serious out of memory issues on smaller embedded
>>> devices (128 mb ram) we made some benchmarks with different schedulers
>>> with the result that cake takes a serious amount of memory. we use the
>>> out of tree cake module and we use it class based since we have complex
>>> methods of doing qos per interface, per mac addresse or even per
>>> ip/network. so its not just simple cake on a single interface solution.
>>> we made some benchmarks with different schedulers. does anybody have a
>>> solution for making that better?
>>>
>>> HTB/FQ_CODEL ------- 62M
>>> HTB/SFQ ------- 62M
>>> HTB/PIE ------- 62M
>>> HTB/FQ_CODEL_FAST ------- 67M
>>> HTB/CAKE -------111M
>>>
>>> HFSC/FQ_CODEL_FAST ------- 47M
>>> HTB/PIE ------- 49M
>>> HTB/SFQ ------- 50M
>>> HFSC /FQ_CODEL ------- 52M
>>> HFSC/CAKE -------109M
>> How are you measuring the memory usage, and what is your full config for
>> each setup? :)
> me? nothing. i requested this test from a reporter and he uses just free 
> / top. so there is a error tollerance.

Ah, I see. So this is just total system memory as reported by top.

> but it shows a significant difference between cake and fq_codel etc. 
> cake is doing a OOM at the end
>
> for the full report including config screenshots see this 
> https://svn.dd-wrt.com/ticket/6798#comment:14. it shows also the qos 
> setup which i can use to reproduce and to
> print out the full tc ruleset if required (which it surelly is for you). 
> if you want i will recreate this setup and send the tc rules on this
> list

Yes, please do. The output of 'tc -s qdisc' would be useful as well to
see how much memory CAKE itself thinks it's using...

Are you setting the memory_limit in your config or relying on CAKE's
default?

-Toke


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Cake] cake memory consumption
  2019-09-16 13:54                         ` Sebastian Gottschall
@ 2019-09-16 14:06                           ` Jonathan Morton
  2019-09-17  5:10                             ` Sebastian Gottschall
  0 siblings, 1 reply; 36+ messages in thread
From: Jonathan Morton @ 2019-09-16 14:06 UTC (permalink / raw)
  To: Sebastian Gottschall; +Cc: cake

If you're able to log in as root, what does "tc -s qdisc | fgrep memory" tell you?

Cake actually does very little dynamic memory allocation.  There's a small amount of memory used per queue and per tin, which should total less than 100KB in "besteffort" mode (which you should be using if you have manual traffic classification).

All other memory consumption is due to packets in the queue, which are allocated by the kernel when they are received, and deallocated when transmitted or dropped.  Cake applies a limit to the memory used by queued packets, generally 4MB by default.  The only way this can be exceeded by more than one packet (transiently, when a packet is enqueued and Cake has to drop other packets to make room) is if there's an unaccounted memory leak somewhere.

If you can find such a leak in Cake, we'll fix it.  But I think it is probably elsewhere.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Cake] cake memory consumption
  2019-09-16 14:01                     ` Toke Høiland-Jørgensen
@ 2019-09-17  5:06                       ` Sebastian Gottschall
  2019-09-17  5:21                       ` Sebastian Gottschall
                                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 36+ messages in thread
From: Sebastian Gottschall @ 2019-09-17  5:06 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, cake


Am 16.09.2019 um 16:01 schrieb Toke Høiland-Jørgensen:
> Sebastian Gottschall <s.gottschall@newmedia-net.de> writes:
>
>> Am 16.09.2019 um 14:08 schrieb Toke Høiland-Jørgensen:
>>> Sebastian Gottschall <s.gottschall@newmedia-net.de> writes:
>>>
>>>> after we found out serious out of memory issues on smaller embedded
>>>> devices (128 mb ram) we made some benchmarks with different schedulers
>>>> with the result that cake takes a serious amount of memory. we use the
>>>> out of tree cake module and we use it class based since we have complex
>>>> methods of doing qos per interface, per mac addresse or even per
>>>> ip/network. so its not just simple cake on a single interface solution.
>>>> we made some benchmarks with different schedulers. does anybody have a
>>>> solution for making that better?
>>>>
>>>> HTB/FQ_CODEL ------- 62M
>>>> HTB/SFQ ------- 62M
>>>> HTB/PIE ------- 62M
>>>> HTB/FQ_CODEL_FAST ------- 67M
>>>> HTB/CAKE -------111M
>>>>
>>>> HFSC/FQ_CODEL_FAST ------- 47M
>>>> HTB/PIE ------- 49M
>>>> HTB/SFQ ------- 50M
>>>> HFSC /FQ_CODEL ------- 52M
>>>> HFSC/CAKE -------109M
>>> How are you measuring the memory usage, and what is your full config for
>>> each setup? :)
>> me? nothing. i requested this test from a reporter and he uses just free
>> / top. so there is a error tollerance.
> Ah, I see. So this is just total system memory as reported by top.
vice versa. this is memory usage and not total system memory. (which 
would be always 128mb)
>
>> but it shows a significant difference between cake and fq_codel etc.
>> cake is doing a OOM at the end
>>
>> for the full report including config screenshots see this
>> https://svn.dd-wrt.com/ticket/6798#comment:14. it shows also the qos
>> setup which i can use to reproduce and to
>> print out the full tc ruleset if required (which it surelly is for you).
>> if you want i will recreate this setup and send the tc rules on this
>> list
> Yes, please do. The output of 'tc -s qdisc' would be useful as well to
> see how much memory CAKE itself thinks it's using...
you will get it will full stats of course (within the next 6 hours. just 
woke up right now)
>
> Are you setting the memory_limit in your config or relying on CAKE's
> default?

no default has been set. so the auto calculation is used within cake 
with the 4 mb limit truncation

>
> -Toke
>
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Cake] cake memory consumption
  2019-09-16 14:06                           ` Jonathan Morton
@ 2019-09-17  5:10                             ` Sebastian Gottschall
  0 siblings, 0 replies; 36+ messages in thread
From: Sebastian Gottschall @ 2019-09-17  5:10 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: cake


Am 16.09.2019 um 16:06 schrieb Jonathan Morton:
> If you're able to log in as root, what does "tc -s qdisc | fgrep memory" tell you?
its not my device. i need to recreate his setup (of the reporter) on my 
testbed first. then i can show you the output.
>
> Cake actually does very little dynamic memory allocation.  There's a small amount of memory used per queue and per tin, which should total less than 100KB in "besteffort" mode (which you should be using if you have manual traffic classification).
bestefford is used. yes
>
> All other memory consumption is due to packets in the queue, which are allocated by the kernel when they are received, and deallocated when transmitted or dropped.  Cake applies a limit to the memory used by queued packets, generally 4MB by default.  The only way this can be exceeded by more than one packet (transiently, when a packet is enqueued and Cake has to drop other packets to make room) is if there's an unaccounted memory leak somewhere.
>
> If you can find such a leak in Cake, we'll fix it.  But I think it is probably elsewhere.

even if elsewhere i'm wondering why only cake triggers it. there is 
nothing different in between when comparing with other schedulers
i dont see a leak here. i more see a massive consumption here. i can run 
cake on devices with more memory and it will not run out of memory.
the assumption is that cake takes the 4 mb memory per qdisc and i have 
alot of  qdiscs


>
>   - Jonathan Morton

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Cake] cake memory consumption
  2019-09-16 14:01                     ` Toke Høiland-Jørgensen
  2019-09-17  5:06                       ` Sebastian Gottschall
@ 2019-09-17  5:21                       ` Sebastian Gottschall
  2019-09-17  5:31                       ` Sebastian Gottschall
  2019-09-17  5:33                       ` Sebastian Gottschall
  3 siblings, 0 replies; 36+ messages in thread
From: Sebastian Gottschall @ 2019-09-17  5:21 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, cake

here the massive output of class, qdisc, filters with -s

root@apreithalle:~# tc -s class show dev eth0
class htb 1:231 parent 1:230 leaf 231: prio 1 rate 102000bit ceil 
128000bit burst 1726b cburst 1760b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 2116406 ctokens: 1718750

class htb 1:10 parent 1:2 leaf 10: prio 3 rate 960000bit ceil 1600Kbit 
burst 2799b cburst 3600b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 364578 ctokens: 281250

class htb 1:100 parent 1:1 leaf 100: prio 1 rate 1280Kbit ceil 1600Kbit 
burst 3200b cburst 3600b
  Sent 3582 bytes 63 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 63 borrowed: 0 giants: 0
  tokens: 307226 ctokens: 277031

class htb 1:230 parent 1:1 rate 128000bit ceil 128000bit burst 1760b 
cburst 1760b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 1718750 ctokens: 1718750

class htb 1:233 parent 1:230 leaf 233: prio 5 rate 38000bit ceil 
128000bit burst 1646b cburst 1760b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 5417750 ctokens: 1718750

class htb 1:232 parent 1:230 leaf 232: prio 4 rate 76000bit ceil 
128000bit burst 1694b cburst 1760b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 2787828 ctokens: 1718750

class htb 1:235 parent 1:230 leaf 235: prio 7 rate 128000bit ceil 
128000bit burst 1760b cburst 1760b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 1718750 ctokens: 1718750

class htb 1:234 parent 1:230 leaf 234: prio 5 rate 12000bit ceil 
128000bit burst 1614b cburst 1760b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 16822906 ctokens: 1718750

class htb 1:264 parent 1:260 leaf 264: prio 5 rate 12000bit ceil 
128000bit burst 1614b cburst 1760b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 16822906 ctokens: 1718750

class htb 1:220 parent 1:1 rate 512000bit ceil 512000bit burst 2240b 
cburst 2240b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 546875 ctokens: 546875

class htb 1:265 parent 1:260 leaf 265: prio 7 rate 128000bit ceil 
128000bit burst 1760b cburst 1760b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 1718750 ctokens: 1718750

class htb 1:1 root rate 1600Kbit ceil 1600Kbit burst 3600b cburst 3600b
  Sent 35106 bytes 181 pkt (dropped 0, overlimits 8 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 8 borrowed: 0 giants: 0
  tokens: 271718 ctokens: 271718

class htb 1:221 parent 1:220 leaf 221: prio 1 rate 409000bit ceil 
512000bit burst 2110b cburst 2240b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 645156 ctokens: 546875

class htb 1:2 parent 1:1 rate 320000bit ceil 1600Kbit burst 2000b cburst 
3600b
  Sent 31524 bytes 118 pkt (dropped 0, overlimits 8 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 8 giants: 0
  tokens: 733593 ctokens: 271718

class htb 1:222 parent 1:220 leaf 222: prio 4 rate 307000bit ceil 
512000bit burst 1982b cburst 2240b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 807406 ctokens: 546875

class htb 1:223 parent 1:220 leaf 223: prio 5 rate 153000bit ceil 
512000bit burst 1790b cburst 2240b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 1463234 ctokens: 546875

class htb 1:260 parent 1:1 rate 128000bit ceil 128000bit burst 1760b 
cburst 1760b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 1718750 ctokens: 1718750

class htb 1:40 parent 1:2 leaf 40: prio 6 rate 128000bit ceil 1600Kbit 
burst 1760b cburst 3600b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 1718750 ctokens: 281250

class htb 1:224 parent 1:220 leaf 224: prio 5 rate 51000bit ceil 
512000bit burst 1662b cburst 2240b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 4075968 ctokens: 546875

class htb 1:261 parent 1:260 leaf 261: prio 1 rate 102000bit ceil 
128000bit burst 1726b cburst 1760b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 2116406 ctokens: 1718750

class htb 1:225 parent 1:220 leaf 225: prio 7 rate 128000bit ceil 
512000bit burst 1760b cburst 2240b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 1718750 ctokens: 546875

class htb 1:262 parent 1:260 leaf 262: prio 4 rate 76000bit ceil 
128000bit burst 1694b cburst 1760b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 2787828 ctokens: 1718750

class htb 1:263 parent 1:260 leaf 263: prio 5 rate 38000bit ceil 
128000bit burst 1646b cburst 1760b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 5417750 ctokens: 1718750

class htb 1:213 parent 1:210 leaf 213: prio 5 rate 76000bit ceil 
256000bit burst 1694b cburst 1920b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 2787828 ctokens: 937500

class htb 1:212 parent 1:210 leaf 212: prio 4 rate 153000bit ceil 
256000bit burst 1790b cburst 1920b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 1463234 ctokens: 937500

class htb 1:255 parent 1:250 leaf 255: prio 7 rate 128000bit ceil 
128000bit burst 1760b cburst 1760b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 1718750 ctokens: 1718750

class htb 1:211 parent 1:210 leaf 211: prio 1 rate 204000bit ceil 
256000bit burst 1854b cburst 1920b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 1136640 ctokens: 937500

class htb 1:254 parent 1:250 leaf 254: prio 5 rate 12000bit ceil 
128000bit burst 1614b cburst 1760b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 16822906 ctokens: 1718750

class htb 1:210 parent 1:1 rate 256000bit ceil 256000bit burst 1920b 
cburst 1920b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 937500 ctokens: 937500

class htb 1:30 parent 1:2 leaf 30: prio 5 rate 160000bit ceil 1600Kbit 
burst 1800b cburst 3600b
  Sent 31524 bytes 118 pkt (dropped 0, overlimits 8 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 110 borrowed: 8 giants: 0
  tokens: 1310937 ctokens: 271718

class htb 1:253 parent 1:250 leaf 253: prio 5 rate 38000bit ceil 
128000bit burst 1646b cburst 1760b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 5417750 ctokens: 1718750

class htb 1:252 parent 1:250 leaf 252: prio 4 rate 76000bit ceil 
128000bit burst 1694b cburst 1760b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 2787828 ctokens: 1718750

class htb 1:251 parent 1:250 leaf 251: prio 1 rate 102000bit ceil 
128000bit burst 1726b cburst 1760b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 2116406 ctokens: 1718750

class htb 1:215 parent 1:210 leaf 215: prio 7 rate 128000bit ceil 
256000bit burst 1760b cburst 1920b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 1718750 ctokens: 937500

class htb 1:250 parent 1:1 rate 128000bit ceil 128000bit burst 1760b 
cburst 1760b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 1718750 ctokens: 1718750

class htb 1:214 parent 1:210 leaf 214: prio 5 rate 25000bit ceil 
256000bit burst 1631b cburst 1920b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 8155000 ctokens: 937500

class htb 1:244 parent 1:240 leaf 244: prio 5 rate 12000bit ceil 
128000bit burst 1614b cburst 1760b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 16822906 ctokens: 1718750

class htb 1:20 parent 1:2 leaf 20: prio 4 rate 480000bit ceil 1600Kbit 
burst 2199b cburst 3600b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 572906 ctokens: 281250

class htb 1:245 parent 1:240 leaf 245: prio 7 rate 128000bit ceil 
128000bit burst 1760b cburst 1760b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 1718750 ctokens: 1718750

class htb 1:242 parent 1:240 leaf 242: prio 4 rate 76000bit ceil 
128000bit burst 1694b cburst 1760b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 2787828 ctokens: 1718750

class htb 1:243 parent 1:240 leaf 243: prio 5 rate 38000bit ceil 
128000bit burst 1646b cburst 1760b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 5417750 ctokens: 1718750

class htb 1:240 parent 1:1 rate 128000bit ceil 128000bit burst 1760b 
cburst 1760b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 1718750 ctokens: 1718750

class htb 1:241 parent 1:240 leaf 241: prio 1 rate 102000bit ceil 
128000bit burst 1726b cburst 1760b
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  rate 0bit 0pps backlog 0b 0p requeues 0
  lended: 0 borrowed: 0 giants: 0
  tokens: 2116406 ctokens: 1718750

class cake 30:18c parent 30:
  (dropped 1, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
   deficit 525 count 0 blue_prob 0
class cake 100:99 parent 100:
  (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
   deficit 1177 count 0 blue_prob 0



root@apreithalle:~# tc -s qdisc show dev eth0
qdisc htb 1: root refcnt 2 r2q 10 default 30 direct_packets_stat 0
  Sent 64601 bytes 257 pkt (dropped 5, overlimits 66 requeues 0)
  backlog 0b 0p requeues 0
qdisc cake 264: parent 1:264 bandwidth unlimited besteffort dual-srchost 
nat nowash ack-filter split-gso rtt 48.0ms raw overhead 0 mpu 84 no-sce
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
  memory used: 0b of 15140Kb
  capacity estimate: 0bit
  min/max network layer size:        65535 /       0
  min/max overhead-adjusted size:    65535 /       0
  average network hdr offset:            0

                   Tin 0
   thresh           0bit
   target          2.4ms
   interval       48.0ms
   pk_delay          0us
   av_delay          0us
   sp_delay          0us
   backlog            0b
   pkts                0
   bytes               0
   way_inds            0
   way_miss            0
   way_cols            0
   sce                 0
   marks               0
   drops               0
   ack_drop            0
   sp_flows            0
   bk_flows            0
   un_flows            0
   max_len             0
   quantum          1514

qdisc cake 251: parent 1:251 bandwidth unlimited besteffort dual-srchost 
nat nowash ack-filter split-gso rtt 48.0ms raw overhead 0 mpu 84 no-sce
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
  memory used: 0b of 15140Kb
  capacity estimate: 0bit
  min/max network layer size:        65535 /       0
  min/max overhead-adjusted size:    65535 /       0
  average network hdr offset:            0

                   Tin 0
   thresh           0bit
   target          2.4ms
   interval       48.0ms
   pk_delay          0us
   av_delay          0us
   sp_delay          0us
   backlog            0b
   pkts                0
   bytes               0
   way_inds            0
   way_miss            0
   way_cols            0
   sce                 0
   marks               0
   drops               0
   ack_drop            0
   sp_flows            0
   bk_flows            0
   un_flows            0
   max_len             0
   quantum          1514

qdisc cake 214: parent 1:214 bandwidth unlimited besteffort dual-srchost 
nat nowash ack-filter split-gso rtt 45.0ms raw overhead 0 mpu 84 no-sce
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
  memory used: 0b of 15140Kb
  capacity estimate: 0bit
  min/max network layer size:        65535 /       0
  min/max overhead-adjusted size:    65535 /       0
  average network hdr offset:            0

                   Tin 0
   thresh           0bit
   target          2.2ms
   interval       45.0ms
   pk_delay          0us
   av_delay          0us
   sp_delay          0us
   backlog            0b
   pkts                0
   bytes               0
   way_inds            0
   way_miss            0
   way_cols            0
   sce                 0
   marks               0
   drops               0
   ack_drop            0
   sp_flows            0
   bk_flows            0
   un_flows            0
   max_len             0
   quantum          1514

qdisc cake 253: parent 1:253 bandwidth unlimited besteffort dual-srchost 
nat nowash ack-filter split-gso rtt 48.0ms raw overhead 0 mpu 84 no-sce
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
  memory used: 0b of 15140Kb
  capacity estimate: 0bit
  min/max network layer size:        65535 /       0
  min/max overhead-adjusted size:    65535 /       0
  average network hdr offset:            0

                   Tin 0
   thresh           0bit
   target          2.4ms
   interval       48.0ms
   pk_delay          0us
   av_delay          0us
   sp_delay          0us
   backlog            0b
   pkts                0
   bytes               0
   way_inds            0
   way_miss            0
   way_cols            0
   sce                 0
   marks               0
   drops               0
   ack_drop            0
   sp_flows            0
   bk_flows            0
   un_flows            0
   max_len             0
   quantum          1514

qdisc cake 255: parent 1:255 bandwidth unlimited besteffort dual-srchost 
nat nowash ack-filter split-gso rtt 48.0ms raw overhead 0 mpu 84 no-sce
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
  memory used: 0b of 15140Kb
  capacity estimate: 0bit
  min/max network layer size:        65535 /       0
  min/max overhead-adjusted size:    65535 /       0
  average network hdr offset:            0

                   Tin 0
   thresh           0bit
   target          2.4ms
   interval       48.0ms
   pk_delay          0us
   av_delay          0us
   sp_delay          0us
   backlog            0b
   pkts                0
   bytes               0
   way_inds            0
   way_miss            0
   way_cols            0
   sce                 0
   marks               0
   drops               0
   ack_drop            0
   sp_flows            0
   bk_flows            0
   un_flows            0
   max_len             0
   quantum          1514

qdisc cake 242: parent 1:242 bandwidth unlimited besteffort dual-srchost 
nat nowash ack-filter split-gso rtt 48.0ms raw overhead 0 mpu 84 no-sce
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
  memory used: 0b of 15140Kb
  capacity estimate: 0bit
  min/max network layer size:        65535 /       0
  min/max overhead-adjusted size:    65535 /       0
  average network hdr offset:            0

                   Tin 0
   thresh           0bit
   target          2.4ms
   interval       48.0ms
   pk_delay          0us
   av_delay          0us
   sp_delay          0us
   backlog            0b
   pkts                0
   bytes               0
   way_inds            0
   way_miss            0
   way_cols            0
   sce                 0
   marks               0
   drops               0
   ack_drop            0
   sp_flows            0
   bk_flows            0
   un_flows            0
   max_len             0
   quantum          1514

qdisc cake 30: parent 1:30 bandwidth unlimited besteffort dual-srchost 
nat nowash ack-filter split-gso rtt 20.0ms raw overhead 0 mpu 84 no-sce
  Sent 59651 bytes 170 pkt (dropped 5, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
  memory used: 8912b of 15140Kb
  capacity estimate: 0bit
  min/max network layer size:           42 /    1514
  min/max overhead-adjusted size:       84 /    1514
  average network hdr offset:            7

                   Tin 0
   thresh           0bit
   target          1.0ms
   interval       20.0ms
   pk_delay       10.7ms
   av_delay        454us
   sp_delay          5us
   backlog            0b
   pkts              175
   bytes           66089
   way_inds            0
   way_miss           23
   way_cols            0
   sce                 0
   marks               0
   drops               5
   ack_drop            0
   sp_flows            0
   bk_flows            1
   un_flows            0
   max_len          3028
   quantum          1514

qdisc cake 244: parent 1:244 bandwidth unlimited besteffort dual-srchost 
nat nowash ack-filter split-gso rtt 48.0ms raw overhead 0 mpu 84 no-sce
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
  memory used: 0b of 15140Kb
  capacity estimate: 0bit
  min/max network layer size:        65535 /       0
  min/max overhead-adjusted size:    65535 /       0
  average network hdr offset:            0

                   Tin 0
   thresh           0bit
   target          2.4ms
   interval       48.0ms
   pk_delay          0us
   av_delay          0us
   sp_delay          0us
   backlog            0b
   pkts                0
   bytes               0
   way_inds            0
   way_miss            0
   way_cols            0
   sce                 0
   marks               0
   drops               0
   ack_drop            0
   sp_flows            0
   bk_flows            0
   un_flows            0
   max_len             0
   quantum          1514

qdisc cake 233: parent 1:233 bandwidth unlimited besteffort dual-srchost 
nat nowash ack-filter split-gso rtt 48.0ms raw overhead 0 mpu 84 no-sce
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
  memory used: 0b of 15140Kb
  capacity estimate: 0bit
  min/max network layer size:        65535 /       0
  min/max overhead-adjusted size:    65535 /       0
  average network hdr offset:            0

                   Tin 0
   thresh           0bit
   target          2.4ms
   interval       48.0ms
   pk_delay          0us
   av_delay          0us
   sp_delay          0us
   backlog            0b
   pkts                0
   bytes               0
   way_inds            0
   way_miss            0
   way_cols            0
   sce                 0
   marks               0
   drops               0
   ack_drop            0
   sp_flows            0
   bk_flows            0
   un_flows            0
   max_len             0
   quantum          1514

qdisc cake 231: parent 1:231 bandwidth unlimited besteffort dual-srchost 
nat nowash ack-filter split-gso rtt 48.0ms raw overhead 0 mpu 84 no-sce
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
  memory used: 0b of 15140Kb
  capacity estimate: 0bit
  min/max network layer size:        65535 /       0
  min/max overhead-adjusted size:    65535 /       0
  average network hdr offset:            0

                   Tin 0
   thresh           0bit
   target          2.4ms
   interval       48.0ms
   pk_delay          0us
   av_delay          0us
   sp_delay          0us
   backlog            0b
   pkts                0
   bytes               0
   way_inds            0
   way_miss            0
   way_cols            0
   sce                 0
   marks               0
   drops               0
   ack_drop            0
   sp_flows            0
   bk_flows            0
   un_flows            0
   max_len             0
   quantum          1514

qdisc cake 100: parent 1:100 bandwidth unlimited besteffort dual-srchost 
nat nowash ack-filter split-gso rtt 20.0ms raw overhead 0 mpu 84 no-sce
  Sent 4950 bytes 87 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
  memory used: 768b of 15140Kb
  capacity estimate: 0bit
  min/max network layer size:           54 /      66
  min/max overhead-adjusted size:       84 /      84
  average network hdr offset:            4

                   Tin 0
   thresh           0bit
   target          1.0ms
   interval       20.0ms
   pk_delay         10us
   av_delay          1us
   sp_delay          1us
   backlog            0b
   pkts               87
   bytes            4950
   way_inds            0
   way_miss           21
   way_cols            0
   sce                 0
   marks               0
   drops               0
   ack_drop            0
   sp_flows            1
   bk_flows            0
   un_flows            0
   max_len            66
   quantum          1514

qdisc cake 235: parent 1:235 bandwidth unlimited besteffort dual-srchost 
nat nowash ack-filter split-gso rtt 48.0ms raw overhead 0 mpu 84 no-sce
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
  memory used: 0b of 15140Kb
  capacity estimate: 0bit
  min/max network layer size:        65535 /       0
  min/max overhead-adjusted size:    65535 /       0
  average network hdr offset:            0

                   Tin 0
   thresh           0bit
   target          2.4ms
   interval       48.0ms
   pk_delay          0us
   av_delay          0us
   sp_delay          0us
   backlog            0b
   pkts                0
   bytes               0
   way_inds            0
   way_miss            0
   way_cols            0
   sce                 0
   marks               0
   drops               0
   ack_drop            0
   sp_flows            0
   bk_flows            0
   un_flows            0
   max_len             0
   quantum          1514

qdisc cake 261: parent 1:261 bandwidth unlimited besteffort dual-srchost 
nat nowash ack-filter split-gso rtt 48.0ms raw overhead 0 mpu 84 no-sce
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
  memory used: 0b of 15140Kb
  capacity estimate: 0bit
  min/max network layer size:        65535 /       0
  min/max overhead-adjusted size:    65535 /       0
  average network hdr offset:            0

                   Tin 0
   thresh           0bit
   target          2.4ms
   interval       48.0ms
   pk_delay          0us
   av_delay          0us
   sp_delay          0us
   backlog            0b
   pkts                0
   bytes               0
   way_inds            0
   way_miss            0
   way_cols            0
   sce                 0
   marks               0
   drops               0
   ack_drop            0
   sp_flows            0
   bk_flows            0
   un_flows            0
   max_len             0
   quantum          1514

qdisc cake 224: parent 1:224 bandwidth unlimited besteffort dual-srchost 
nat nowash ack-filter split-gso rtt 40.0ms raw overhead 0 mpu 84 no-sce
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
  memory used: 0b of 15140Kb
  capacity estimate: 0bit
  min/max network layer size:        65535 /       0
  min/max overhead-adjusted size:    65535 /       0
  average network hdr offset:            0

                   Tin 0
   thresh           0bit
   target          2.0ms
   interval       40.0ms
   pk_delay          0us
   av_delay          0us
   sp_delay          0us
   backlog            0b
   pkts                0
   bytes               0
   way_inds            0
   way_miss            0
   way_cols            0
   sce                 0
   marks               0
   drops               0
   ack_drop            0
   sp_flows            0
   bk_flows            0
   un_flows            0
   max_len             0
   quantum          1514

qdisc cake 222: parent 1:222 bandwidth unlimited besteffort dual-srchost 
nat nowash ack-filter split-gso rtt 40.0ms raw overhead 0 mpu 84 no-sce
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
  memory used: 0b of 15140Kb
  capacity estimate: 0bit
  min/max network layer size:        65535 /       0
  min/max overhead-adjusted size:    65535 /       0
  average network hdr offset:            0

                   Tin 0
   thresh           0bit
   target          2.0ms
   interval       40.0ms
   pk_delay          0us
   av_delay          0us
   sp_delay          0us
   backlog            0b
   pkts                0
   bytes               0
   way_inds            0
   way_miss            0
   way_cols            0
   sce                 0
   marks               0
   drops               0
   ack_drop            0
   sp_flows            0
   bk_flows            0
   un_flows            0
   max_len             0
   quantum          1514

qdisc cake 10: parent 1:10 bandwidth unlimited besteffort dual-srchost 
nat nowash ack-filter split-gso rtt 20.0ms raw overhead 0 mpu 84 no-sce
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
  memory used: 0b of 15140Kb
  capacity estimate: 0bit
  min/max network layer size:        65535 /       0
  min/max overhead-adjusted size:    65535 /       0
  average network hdr offset:            0

                   Tin 0
   thresh           0bit
   target          1.0ms
   interval       20.0ms
   pk_delay          0us
   av_delay          0us
   sp_delay          0us
   backlog            0b
   pkts                0
   bytes               0
   way_inds            0
   way_miss            0
   way_cols            0
   sce                 0
   marks               0
   drops               0
   ack_drop            0
   sp_flows            0
   bk_flows            0
   un_flows            0
   max_len             0
   quantum          1514

qdisc cake 263: parent 1:263 bandwidth unlimited besteffort dual-srchost 
nat nowash ack-filter split-gso rtt 48.0ms raw overhead 0 mpu 84 no-sce
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
  memory used: 0b of 15140Kb
  capacity estimate: 0bit
  min/max network layer size:        65535 /       0
  min/max overhead-adjusted size:    65535 /       0
  average network hdr offset:            0

                   Tin 0
   thresh           0bit
   target          2.4ms
   interval       48.0ms
   pk_delay          0us
   av_delay          0us
   sp_delay          0us
   backlog            0b
   pkts                0
   bytes               0
   way_inds            0
   way_miss            0
   way_cols            0
   sce                 0
   marks               0
   drops               0
   ack_drop            0
   sp_flows            0
   bk_flows            0
   un_flows            0
   max_len             0
   quantum          1514

qdisc cake 211: parent 1:211 bandwidth unlimited besteffort dual-srchost 
nat nowash ack-filter split-gso rtt 45.0ms raw overhead 0 mpu 84 no-sce
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
  memory used: 0b of 15140Kb
  capacity estimate: 0bit
  min/max network layer size:        65535 /       0
  min/max overhead-adjusted size:    65535 /       0
  average network hdr offset:            0

                   Tin 0
   thresh           0bit
   target          2.2ms
   interval       45.0ms
   pk_delay          0us
   av_delay          0us
   sp_delay          0us
   backlog            0b
   pkts                0
   bytes               0
   way_inds            0
   way_miss            0
   way_cols            0
   sce                 0
   marks               0
   drops               0
   ack_drop            0
   sp_flows            0
   bk_flows            0
   un_flows            0
   max_len             0
   quantum          1514

qdisc cake 265: parent 1:265 bandwidth unlimited besteffort dual-srchost 
nat nowash ack-filter split-gso rtt 48.0ms raw overhead 0 mpu 84 no-sce
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
  memory used: 0b of 15140Kb
  capacity estimate: 0bit
  min/max network layer size:        65535 /       0
  min/max overhead-adjusted size:    65535 /       0
  average network hdr offset:            0

                   Tin 0
   thresh           0bit
   target          2.4ms
   interval       48.0ms
   pk_delay          0us
   av_delay          0us
   sp_delay          0us
   backlog            0b
   pkts                0
   bytes               0
   way_inds            0
   way_miss            0
   way_cols            0
   sce                 0
   marks               0
   drops               0
   ack_drop            0
   sp_flows            0
   bk_flows            0
   un_flows            0
   max_len             0
   quantum          1514

qdisc cake 213: parent 1:213 bandwidth unlimited besteffort dual-srchost 
nat nowash ack-filter split-gso rtt 45.0ms raw overhead 0 mpu 84 no-sce
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
  memory used: 0b of 15140Kb
  capacity estimate: 0bit
  min/max network layer size:        65535 /       0
  min/max overhead-adjusted size:    65535 /       0
  average network hdr offset:            0

                   Tin 0
   thresh           0bit
   target          2.2ms
   interval       45.0ms
   pk_delay          0us
   av_delay          0us
   sp_delay          0us
   backlog            0b
   pkts                0
   bytes               0
   way_inds            0
   way_miss            0
   way_cols            0
   sce                 0
   marks               0
   drops               0
   ack_drop            0
   sp_flows            0
   bk_flows            0
   un_flows            0
   max_len             0
   quantum          1514

qdisc cake 254: parent 1:254 bandwidth unlimited besteffort dual-srchost 
nat nowash ack-filter split-gso rtt 48.0ms raw overhead 0 mpu 84 no-sce
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
  memory used: 0b of 15140Kb
  capacity estimate: 0bit
  min/max network layer size:        65535 /       0
  min/max overhead-adjusted size:    65535 /       0
  average network hdr offset:            0

                   Tin 0
   thresh           0bit
   target          2.4ms
   interval       48.0ms
   pk_delay          0us
   av_delay          0us
   sp_delay          0us
   backlog            0b
   pkts                0
   bytes               0
   way_inds            0
   way_miss            0
   way_cols            0
   sce                 0
   marks               0
   drops               0
   ack_drop            0
   sp_flows            0
   bk_flows            0
   un_flows            0
   max_len             0
   quantum          1514

qdisc cake 252: parent 1:252 bandwidth unlimited besteffort dual-srchost 
nat nowash ack-filter split-gso rtt 48.0ms raw overhead 0 mpu 84 no-sce
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
  memory used: 0b of 15140Kb
  capacity estimate: 0bit
  min/max network layer size:        65535 /       0
  min/max overhead-adjusted size:    65535 /       0
  average network hdr offset:            0

                   Tin 0
   thresh           0bit
   target          2.4ms
   interval       48.0ms
   pk_delay          0us
   av_delay          0us
   sp_delay          0us
   backlog            0b
   pkts                0
   bytes               0
   way_inds            0
   way_miss            0
   way_cols            0
   sce                 0
   marks               0
   drops               0
   ack_drop            0
   sp_flows            0
   bk_flows            0
   un_flows            0
   max_len             0
   quantum          1514

qdisc cake 215: parent 1:215 bandwidth unlimited besteffort dual-srchost 
nat nowash ack-filter split-gso rtt 45.0ms raw overhead 0 mpu 84 no-sce
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
  memory used: 0b of 15140Kb
  capacity estimate: 0bit
  min/max network layer size:        65535 /       0
  min/max overhead-adjusted size:    65535 /       0
  average network hdr offset:            0

                   Tin 0
   thresh           0bit
   target          2.2ms
   interval       45.0ms
   pk_delay          0us
   av_delay          0us
   sp_delay          0us
   backlog            0b
   pkts                0
   bytes               0
   way_inds            0
   way_miss            0
   way_cols            0
   sce                 0
   marks               0
   drops               0
   ack_drop            0
   sp_flows            0
   bk_flows            0
   un_flows            0
   max_len             0
   quantum          1514

qdisc cake 40: parent 1:40 bandwidth unlimited besteffort dual-srchost 
nat nowash ack-filter split-gso rtt 20.0ms raw overhead 0 mpu 84 no-sce
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
  memory used: 0b of 15140Kb
  capacity estimate: 0bit
  min/max network layer size:        65535 /       0
  min/max overhead-adjusted size:    65535 /       0
  average network hdr offset:            0

                   Tin 0
   thresh           0bit
   target          1.0ms
   interval       20.0ms
   pk_delay          0us
   av_delay          0us
   sp_delay          0us
   backlog            0b
   pkts                0
   bytes               0
   way_inds            0
   way_miss            0
   way_cols            0
   sce                 0
   marks               0
   drops               0
   ack_drop            0
   sp_flows            0
   bk_flows            0
   un_flows            0
   max_len             0
   quantum          1514

qdisc cake 241: parent 1:241 bandwidth unlimited besteffort dual-srchost 
nat nowash ack-filter split-gso rtt 48.0ms raw overhead 0 mpu 84 no-sce
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
  memory used: 0b of 15140Kb
  capacity estimate: 0bit
  min/max network layer size:        65535 /       0
  min/max overhead-adjusted size:    65535 /       0
  average network hdr offset:            0

                   Tin 0
   thresh           0bit
   target          2.4ms
   interval       48.0ms
   pk_delay          0us
   av_delay          0us
   sp_delay          0us
   backlog            0b
   pkts                0
   bytes               0
   way_inds            0
   way_miss            0
   way_cols            0
   sce                 0
   marks               0
   drops               0
   ack_drop            0
   sp_flows            0
   bk_flows            0
   un_flows            0
   max_len             0
   quantum          1514

qdisc cake 245: parent 1:245 bandwidth unlimited besteffort dual-srchost 
nat nowash ack-filter split-gso rtt 48.0ms raw overhead 0 mpu 84 no-sce
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
  memory used: 0b of 15140Kb
  capacity estimate: 0bit
  min/max network layer size:        65535 /       0
  min/max overhead-adjusted size:    65535 /       0
  average network hdr offset:            0

                   Tin 0
   thresh           0bit
   target          2.4ms
   interval       48.0ms
   pk_delay          0us
   av_delay          0us
   sp_delay          0us
   backlog            0b
   pkts                0
   bytes               0
   way_inds            0
   way_miss            0
   way_cols            0
   sce                 0
   marks               0
   drops               0
   ack_drop            0
   sp_flows            0
   bk_flows            0
   un_flows            0
   max_len             0
   quantum          1514

qdisc cake 243: parent 1:243 bandwidth unlimited besteffort dual-srchost 
nat nowash ack-filter split-gso rtt 48.0ms raw overhead 0 mpu 84 no-sce
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
  memory used: 0b of 15140Kb
  capacity estimate: 0bit
  min/max network layer size:        65535 /       0
  min/max overhead-adjusted size:    65535 /       0
  average network hdr offset:            0

                   Tin 0
   thresh           0bit
   target          2.4ms
   interval       48.0ms
   pk_delay          0us
   av_delay          0us
   sp_delay          0us
   backlog            0b
   pkts                0
   bytes               0
   way_inds            0
   way_miss            0
   way_cols            0
   sce                 0
   marks               0
   drops               0
   ack_drop            0
   sp_flows            0
   bk_flows            0
   un_flows            0
   max_len             0
   quantum          1514

qdisc cake 232: parent 1:232 bandwidth unlimited besteffort dual-srchost 
nat nowash ack-filter split-gso rtt 48.0ms raw overhead 0 mpu 84 no-sce
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
  memory used: 0b of 15140Kb
  capacity estimate: 0bit
  min/max network layer size:        65535 /       0
  min/max overhead-adjusted size:    65535 /       0
  average network hdr offset:            0

                   Tin 0
   thresh           0bit
   target          2.4ms
   interval       48.0ms
   pk_delay          0us
   av_delay          0us
   sp_delay          0us
   backlog            0b
   pkts                0
   bytes               0
   way_inds            0
   way_miss            0
   way_cols            0
   sce                 0
   marks               0
   drops               0
   ack_drop            0
   sp_flows            0
   bk_flows            0
   un_flows            0
   max_len             0
   quantum          1514

qdisc cake 20: parent 1:20 bandwidth unlimited besteffort dual-srchost 
nat nowash ack-filter split-gso rtt 20.0ms raw overhead 0 mpu 84 no-sce
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
  memory used: 0b of 15140Kb
  capacity estimate: 0bit
  min/max network layer size:        65535 /       0
  min/max overhead-adjusted size:    65535 /       0
  average network hdr offset:            0

                   Tin 0
   thresh           0bit
   target          1.0ms
   interval       20.0ms
   pk_delay          0us
   av_delay          0us
   sp_delay          0us
   backlog            0b
   pkts                0
   bytes               0
   way_inds            0
   way_miss            0
   way_cols            0
   sce                 0
   marks               0
   drops               0
   ack_drop            0
   sp_flows            0
   bk_flows            0
   un_flows            0
   max_len             0
   quantum          1514

qdisc cake 234: parent 1:234 bandwidth unlimited besteffort dual-srchost 
nat nowash ack-filter split-gso rtt 48.0ms raw overhead 0 mpu 84 no-sce
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
  memory used: 0b of 15140Kb
  capacity estimate: 0bit
  min/max network layer size:        65535 /       0
  min/max overhead-adjusted size:    65535 /       0
  average network hdr offset:            0

                   Tin 0
   thresh           0bit
   target          2.4ms
   interval       48.0ms
   pk_delay          0us
   av_delay          0us
   sp_delay          0us
   backlog            0b
   pkts                0
   bytes               0
   way_inds            0
   way_miss            0
   way_cols            0
   sce                 0
   marks               0
   drops               0
   ack_drop            0
   sp_flows            0
   bk_flows            0
   un_flows            0
   max_len             0
   quantum          1514

qdisc cake 221: parent 1:221 bandwidth unlimited besteffort dual-srchost 
nat nowash ack-filter split-gso rtt 40.0ms raw overhead 0 mpu 84 no-sce
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
  memory used: 0b of 15140Kb
  capacity estimate: 0bit
  min/max network layer size:        65535 /       0
  min/max overhead-adjusted size:    65535 /       0
  average network hdr offset:            0

                   Tin 0
   thresh           0bit
   target          2.0ms
   interval       40.0ms
   pk_delay          0us
   av_delay          0us
   sp_delay          0us
   backlog            0b
   pkts                0
   bytes               0
   way_inds            0
   way_miss            0
   way_cols            0
   sce                 0
   marks               0
   drops               0
   ack_drop            0
   sp_flows            0
   bk_flows            0
   un_flows            0
   max_len             0
   quantum          1514

qdisc cake 223: parent 1:223 bandwidth unlimited besteffort dual-srchost 
nat nowash ack-filter split-gso rtt 40.0ms raw overhead 0 mpu 84 no-sce
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
  memory used: 0b of 15140Kb
  capacity estimate: 0bit
  min/max network layer size:        65535 /       0
  min/max overhead-adjusted size:    65535 /       0
  average network hdr offset:            0

                   Tin 0
   thresh           0bit
   target          2.0ms
   interval       40.0ms
   pk_delay          0us
   av_delay          0us
   sp_delay          0us
   backlog            0b
   pkts                0
   bytes               0
   way_inds            0
   way_miss            0
   way_cols            0
   sce                 0
   marks               0
   drops               0
   ack_drop            0
   sp_flows            0
   bk_flows            0
   un_flows            0
   max_len             0
   quantum          1514

qdisc cake 262: parent 1:262 bandwidth unlimited besteffort dual-srchost 
nat nowash ack-filter split-gso rtt 48.0ms raw overhead 0 mpu 84 no-sce
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
  memory used: 0b of 15140Kb
  capacity estimate: 0bit
  min/max network layer size:        65535 /       0
  min/max overhead-adjusted size:    65535 /       0
  average network hdr offset:            0

                   Tin 0
   thresh           0bit
   target          2.4ms
   interval       48.0ms
   pk_delay          0us
   av_delay          0us
   sp_delay          0us
   backlog            0b
   pkts                0
   bytes               0
   way_inds            0
   way_miss            0
   way_cols            0
   sce                 0
   marks               0
   drops               0
   ack_drop            0
   sp_flows            0
   bk_flows            0
   un_flows            0
   max_len             0
   quantum          1514

qdisc cake 225: parent 1:225 bandwidth unlimited besteffort dual-srchost 
nat nowash ack-filter split-gso rtt 40.0ms raw overhead 0 mpu 84 no-sce
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
  memory used: 0b of 15140Kb
  capacity estimate: 0bit
  min/max network layer size:        65535 /       0
  min/max overhead-adjusted size:    65535 /       0
  average network hdr offset:            0

                   Tin 0
   thresh           0bit
   target          2.0ms
   interval       40.0ms
   pk_delay          0us
   av_delay          0us
   sp_delay          0us
   backlog            0b
   pkts                0
   bytes               0
   way_inds            0
   way_miss            0
   way_cols            0
   sce                 0
   marks               0
   drops               0
   ack_drop            0
   sp_flows            0
   bk_flows            0
   un_flows            0
   max_len             0
   quantum          1514

qdisc cake 212: parent 1:212 bandwidth unlimited besteffort dual-srchost 
nat nowash ack-filter split-gso rtt 45.0ms raw overhead 0 mpu 84 no-sce
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0
  memory used: 0b of 15140Kb
  capacity estimate: 0bit
  min/max network layer size:        65535 /       0
  min/max overhead-adjusted size:    65535 /       0
  average network hdr offset:            0

                   Tin 0
   thresh           0bit
   target          2.2ms
   interval       45.0ms
   pk_delay          0us
   av_delay          0us
   sp_delay          0us
   backlog            0b
   pkts                0
   bytes               0
   way_inds            0
   way_miss            0
   way_cols            0
   sce                 0
   marks               0
   drops               0
   ack_drop            0
   sp_flows            0
   bk_flows            0
   un_flows            0
   max_len             0
   quantum          1514


root@apreithalle:~# tc -s filter show dev eth0
filter parent 1: protocol ip pref 2 u32
filter parent 1: protocol ip pref 2 u32 fh 805: ht divisor 1
filter parent 1: protocol ip pref 2 u32 fh 805::800 order 2048 key ht 
805 bkt 0 flowid 1:211  (rule hit 248 success 0)
   mark 0x34800 0x7ffc00 (success 0)
filter parent 1: protocol ip pref 2 u32 fh 805::801 order 2049 key ht 
805 bkt 0 flowid 1:221  (rule hit 248 success 0)
   mark 0x37000 0x7ffc00 (success 0)
filter parent 1: protocol ip pref 2 u32 fh 805::802 order 2050 key ht 
805 bkt 0 flowid 1:231  (rule hit 248 success 0)
   mark 0x39800 0x7ffc00 (success 0)
filter parent 1: protocol ip pref 2 u32 fh 805::803 order 2051 key ht 
805 bkt 0 flowid 1:241  (rule hit 248 success 0)
   mark 0x3c000 0x7ffc00 (success 0)
filter parent 1: protocol ip pref 2 u32 fh 805::804 order 2052 key ht 
805 bkt 0 flowid 1:251  (rule hit 248 success 0)
   mark 0x3e800 0x7ffc00 (success 0)
filter parent 1: protocol ip pref 2 u32 fh 805::805 order 2053 key ht 
805 bkt 0 flowid 1:261  (rule hit 248 success 0)
   mark 0x41000 0x7ffc00 (success 0)
filter parent 1: protocol ip pref 3 u32
filter parent 1: protocol ip pref 3 u32 fh 804: ht divisor 1
filter parent 1: protocol ip pref 3 u32 fh 804::800 order 2048 key ht 
804 bkt 0 flowid 1:40  (rule hit 248 success 0)
   mark 0xa000 0x7ffc00 (success 0)
filter parent 1: protocol ip pref 4 u32
filter parent 1: protocol ip pref 4 u32 fh 806: ht divisor 1
filter parent 1: protocol ip pref 4 u32 fh 806::800 order 2048 key ht 
806 bkt 0 flowid 1:212  (rule hit 248 success 0)
   mark 0x34c00 0x7ffc00 (success 0)
filter parent 1: protocol ip pref 4 u32 fh 806::801 order 2049 key ht 
806 bkt 0 flowid 1:222  (rule hit 248 success 0)
   mark 0x37400 0x7ffc00 (success 0)
filter parent 1: protocol ip pref 4 u32 fh 806::802 order 2050 key ht 
806 bkt 0 flowid 1:232  (rule hit 248 success 0)
   mark 0x39c00 0x7ffc00 (success 0)
filter parent 1: protocol ip pref 4 u32 fh 806::803 order 2051 key ht 
806 bkt 0 flowid 1:242  (rule hit 248 success 0)
   mark 0x3c400 0x7ffc00 (success 0)
filter parent 1: protocol ip pref 4 u32 fh 806::804 order 2052 key ht 
806 bkt 0 flowid 1:252  (rule hit 248 success 0)
   mark 0x3ec00 0x7ffc00 (success 0)
filter parent 1: protocol ip pref 4 u32 fh 806::805 order 2053 key ht 
806 bkt 0 flowid 1:262  (rule hit 248 success 0)
   mark 0x41400 0x7ffc00 (success 0)
filter parent 1: protocol ip pref 6 u32
filter parent 1: protocol ip pref 6 u32 fh 807: ht divisor 1
filter parent 1: protocol ip pref 6 u32 fh 807::800 order 2048 key ht 
807 bkt 0 flowid 1:213  (rule hit 248 success 0)
   mark 0x35000 0x7ffc00 (success 0)
filter parent 1: protocol ip pref 6 u32 fh 807::801 order 2049 key ht 
807 bkt 0 flowid 1:223  (rule hit 248 success 0)
   mark 0x37800 0x7ffc00 (success 0)
filter parent 1: protocol ip pref 6 u32 fh 807::802 order 2050 key ht 
807 bkt 0 flowid 1:233  (rule hit 248 success 0)
   mark 0x3a000 0x7ffc00 (success 0)
filter parent 1: protocol ip pref 6 u32 fh 807::803 order 2051 key ht 
807 bkt 0 flowid 1:243  (rule hit 248 success 0)
   mark 0x3c800 0x7ffc00 (success 0)
filter parent 1: protocol ip pref 6 u32 fh 807::804 order 2052 key ht 
807 bkt 0 flowid 1:253  (rule hit 248 success 0)
   mark 0x3f000 0x7ffc00 (success 0)
filter parent 1: protocol ip pref 6 u32 fh 807::805 order 2053 key ht 
807 bkt 0 flowid 1:263  (rule hit 248 success 0)
   mark 0x41800 0x7ffc00 (success 0)
filter parent 1: protocol ip pref 9 u32
filter parent 1: protocol ip pref 9 u32 fh 803: ht divisor 1
filter parent 1: protocol ip pref 9 u32 fh 803::800 order 2048 key ht 
803 bkt 0 flowid 1:30  (rule hit 248 success 0)
   mark 0x7800 0x7ffc00 (success 0)
filter parent 1: protocol ip pref 9 u32 fh 803::801 order 2049 key ht 
803 bkt 0 flowid 1:214  (rule hit 248 success 0)
   mark 0x35400 0x7ffc00 (success 0)
filter parent 1: protocol ip pref 9 u32 fh 803::802 order 2050 key ht 
803 bkt 0 flowid 1:224  (rule hit 248 success 0)
   mark 0x37c00 0x7ffc00 (success 0)
filter parent 1: protocol ip pref 9 u32 fh 803::803 order 2051 key ht 
803 bkt 0 flowid 1:234  (rule hit 248 success 0)
   mark 0x3a400 0x7ffc00 (success 0)
filter parent 1: protocol ip pref 9 u32 fh 803::804 order 2052 key ht 
803 bkt 0 flowid 1:244  (rule hit 248 success 0)
   mark 0x3cc00 0x7ffc00 (success 0)
filter parent 1: protocol ip pref 9 u32 fh 803::805 order 2053 key ht 
803 bkt 0 flowid 1:254  (rule hit 248 success 0)
   mark 0x3f400 0x7ffc00 (success 0)
filter parent 1: protocol ip pref 9 u32 fh 803::806 order 2054 key ht 
803 bkt 0 flowid 1:264  (rule hit 248 success 0)
   mark 0x41c00 0x7ffc00 (success 0)
filter parent 1: protocol ip pref 10 u32
filter parent 1: protocol ip pref 10 u32 fh 802: ht divisor 1
filter parent 1: protocol ip pref 10 u32 fh 802::800 order 2048 key ht 
802 bkt 0 flowid 1:20  (rule hit 248 success 0)
   mark 0x5000 0x7ffc00 (success 0)
filter parent 1: protocol ip pref 10 u32 fh 802::801 order 2049 key ht 
802 bkt 0 flowid 1:215  (rule hit 248 success 0)
   mark 0x35800 0x7ffc00 (success 0)
filter parent 1: protocol ip pref 10 u32 fh 802::802 order 2050 key ht 
802 bkt 0 flowid 1:225  (rule hit 248 success 0)
   mark 0x38000 0x7ffc00 (success 0)
filter parent 1: protocol ip pref 10 u32 fh 802::803 order 2051 key ht 
802 bkt 0 flowid 1:235  (rule hit 248 success 0)
   mark 0x3a800 0x7ffc00 (success 0)
filter parent 1: protocol ip pref 10 u32 fh 802::804 order 2052 key ht 
802 bkt 0 flowid 1:245  (rule hit 248 success 0)
   mark 0x3d000 0x7ffc00 (success 0)
filter parent 1: protocol ip pref 10 u32 fh 802::805 order 2053 key ht 
802 bkt 0 flowid 1:255  (rule hit 248 success 0)
   mark 0x3f800 0x7ffc00 (success 0)
filter parent 1: protocol ip pref 10 u32 fh 802::806 order 2054 key ht 
802 bkt 0 flowid 1:265  (rule hit 248 success 0)
   mark 0x42000 0x7ffc00 (success 0)
filter parent 1: protocol ip pref 11 u32
filter parent 1: protocol ip pref 11 u32 fh 801: ht divisor 1
filter parent 1: protocol ip pref 11 u32 fh 801::800 order 2048 key ht 
801 bkt 0 flowid 1:10  (rule hit 248 success 0)
   mark 0x2800 0x7ffc00 (success 0)
filter parent 1: protocol ip pref 12 u32
filter parent 1: protocol ip pref 12 u32 fh 800: ht divisor 1
filter parent 1: protocol ip pref 12 u32 fh 800::800 order 2048 key ht 
800 bkt 0 flowid 1:100  (rule hit 248 success 0)
   mark 0x19000 0x7ffc00 (success 0)
root@apreithalle:~#




Am 16.09.2019 um 16:01 schrieb Toke Høiland-Jørgensen:
> Sebastian Gottschall <s.gottschall@newmedia-net.de> writes:
>
>> Am 16.09.2019 um 14:08 schrieb Toke Høiland-Jørgensen:
>>> Sebastian Gottschall <s.gottschall@newmedia-net.de> writes:
>>>
>>>> after we found out serious out of memory issues on smaller embedded
>>>> devices (128 mb ram) we made some benchmarks with different schedulers
>>>> with the result that cake takes a serious amount of memory. we use the
>>>> out of tree cake module and we use it class based since we have complex
>>>> methods of doing qos per interface, per mac addresse or even per
>>>> ip/network. so its not just simple cake on a single interface solution.
>>>> we made some benchmarks with different schedulers. does anybody have a
>>>> solution for making that better?
>>>>
>>>> HTB/FQ_CODEL ------- 62M
>>>> HTB/SFQ ------- 62M
>>>> HTB/PIE ------- 62M
>>>> HTB/FQ_CODEL_FAST ------- 67M
>>>> HTB/CAKE -------111M
>>>>
>>>> HFSC/FQ_CODEL_FAST ------- 47M
>>>> HTB/PIE ------- 49M
>>>> HTB/SFQ ------- 50M
>>>> HFSC /FQ_CODEL ------- 52M
>>>> HFSC/CAKE -------109M
>>> How are you measuring the memory usage, and what is your full config for
>>> each setup? :)
>> me? nothing. i requested this test from a reporter and he uses just free
>> / top. so there is a error tollerance.
> Ah, I see. So this is just total system memory as reported by top.
>
>> but it shows a significant difference between cake and fq_codel etc.
>> cake is doing a OOM at the end
>>
>> for the full report including config screenshots see this
>> https://svn.dd-wrt.com/ticket/6798#comment:14. it shows also the qos
>> setup which i can use to reproduce and to
>> print out the full tc ruleset if required (which it surelly is for you).
>> if you want i will recreate this setup and send the tc rules on this
>> list
> Yes, please do. The output of 'tc -s qdisc' would be useful as well to
> see how much memory CAKE itself thinks it's using...
>
> Are you setting the memory_limit in your config or relying on CAKE's
> default?
>
> -Toke
>
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Cake] cake memory consumption
  2019-09-16 14:01                     ` Toke Høiland-Jørgensen
  2019-09-17  5:06                       ` Sebastian Gottschall
  2019-09-17  5:21                       ` Sebastian Gottschall
@ 2019-09-17  5:31                       ` Sebastian Gottschall
  2019-09-17  9:21                         ` Jonathan Morton
  2019-09-17  5:33                       ` Sebastian Gottschall
  3 siblings, 1 reply; 36+ messages in thread
From: Sebastian Gottschall @ 2019-09-17  5:31 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, cake

according to the output there is a flaw/bug in the memory limit calculation
cake_reconfigure     may set buffer_limit to  ~0 if no rate is set.

the following line "min(buffer_limit, max(sch->limit * 
psched_mtu(qdisc_dec(sch), q->buffer_config_limit))" doesnt make it 
better since buffer_config_limit is not configured
so we got a possible memory overuse here.

my proposal

-       q->buffer_limit = min(q->buffer_limit,
-                             max(sch->limit * psched_mtu(qdisc_dev(sch)),
-                                 q->buffer_config_limit));
+       if (q->buffer_config_limit)
+               q->buffer_limit = min(q->buffer_limit,
+                       max(sch->limit * psched_mtu(qdisc_dev(sch)), 
q->buffer_config_limit));
+       else
+               q->buffer_limit = min(q->buffer_limit,
+                       max(sch->limit * psched_mtu(qdisc_dev(sch)), 4U 
<< 20));

Am 16.09.2019 um 16:01 schrieb Toke Høiland-Jørgensen:
> Sebastian Gottschall <s.gottschall@newmedia-net.de> writes:
>
>> Am 16.09.2019 um 14:08 schrieb Toke Høiland-Jørgensen:
>>> Sebastian Gottschall <s.gottschall@newmedia-net.de> writes:
>>>
>>>> after we found out serious out of memory issues on smaller embedded
>>>> devices (128 mb ram) we made some benchmarks with different schedulers
>>>> with the result that cake takes a serious amount of memory. we use the
>>>> out of tree cake module and we use it class based since we have complex
>>>> methods of doing qos per interface, per mac addresse or even per
>>>> ip/network. so its not just simple cake on a single interface solution.
>>>> we made some benchmarks with different schedulers. does anybody have a
>>>> solution for making that better?
>>>>
>>>> HTB/FQ_CODEL ------- 62M
>>>> HTB/SFQ ------- 62M
>>>> HTB/PIE ------- 62M
>>>> HTB/FQ_CODEL_FAST ------- 67M
>>>> HTB/CAKE -------111M
>>>>
>>>> HFSC/FQ_CODEL_FAST ------- 47M
>>>> HTB/PIE ------- 49M
>>>> HTB/SFQ ------- 50M
>>>> HFSC /FQ_CODEL ------- 52M
>>>> HFSC/CAKE -------109M
>>> How are you measuring the memory usage, and what is your full config for
>>> each setup? :)
>> me? nothing. i requested this test from a reporter and he uses just free
>> / top. so there is a error tollerance.
> Ah, I see. So this is just total system memory as reported by top.
>
>> but it shows a significant difference between cake and fq_codel etc.
>> cake is doing a OOM at the end
>>
>> for the full report including config screenshots see this
>> https://svn.dd-wrt.com/ticket/6798#comment:14. it shows also the qos
>> setup which i can use to reproduce and to
>> print out the full tc ruleset if required (which it surelly is for you).
>> if you want i will recreate this setup and send the tc rules on this
>> list
> Yes, please do. The output of 'tc -s qdisc' would be useful as well to
> see how much memory CAKE itself thinks it's using...
>
> Are you setting the memory_limit in your config or relying on CAKE's
> default?
>
> -Toke
>
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Cake] cake memory consumption
  2019-09-16 14:01                     ` Toke Høiland-Jørgensen
                                         ` (2 preceding siblings ...)
  2019-09-17  5:31                       ` Sebastian Gottschall
@ 2019-09-17  5:33                       ` Sebastian Gottschall
  2019-09-17  9:40                         ` Toke Høiland-Jørgensen
  3 siblings, 1 reply; 36+ messages in thread
From: Sebastian Gottschall @ 2019-09-17  5:33 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, cake

more simple patch

--- sch_cake.c  (revision 41051)
+++ sch_cake.c  (working copy)
@@ -2691,7 +2691,7 @@
                 do_div(t, USEC_PER_SEC / 4);
                 q->buffer_limit = max_t(u32, t, 4U << 20);
         } else {
-               q->buffer_limit = ~0;
+               q->buffer_limit = 4U << 20;
         }

         sch->flags &= ~TCQ_F_CAN_BYPASS;

Am 16.09.2019 um 16:01 schrieb Toke Høiland-Jørgensen:
> Sebastian Gottschall <s.gottschall@newmedia-net.de> writes:
>
>> Am 16.09.2019 um 14:08 schrieb Toke Høiland-Jørgensen:
>>> Sebastian Gottschall <s.gottschall@newmedia-net.de> writes:
>>>
>>>> after we found out serious out of memory issues on smaller embedded
>>>> devices (128 mb ram) we made some benchmarks with different schedulers
>>>> with the result that cake takes a serious amount of memory. we use the
>>>> out of tree cake module and we use it class based since we have complex
>>>> methods of doing qos per interface, per mac addresse or even per
>>>> ip/network. so its not just simple cake on a single interface solution.
>>>> we made some benchmarks with different schedulers. does anybody have a
>>>> solution for making that better?
>>>>
>>>> HTB/FQ_CODEL ------- 62M
>>>> HTB/SFQ ------- 62M
>>>> HTB/PIE ------- 62M
>>>> HTB/FQ_CODEL_FAST ------- 67M
>>>> HTB/CAKE -------111M
>>>>
>>>> HFSC/FQ_CODEL_FAST ------- 47M
>>>> HTB/PIE ------- 49M
>>>> HTB/SFQ ------- 50M
>>>> HFSC /FQ_CODEL ------- 52M
>>>> HFSC/CAKE -------109M
>>> How are you measuring the memory usage, and what is your full config for
>>> each setup? :)
>> me? nothing. i requested this test from a reporter and he uses just free
>> / top. so there is a error tollerance.
> Ah, I see. So this is just total system memory as reported by top.
>
>> but it shows a significant difference between cake and fq_codel etc.
>> cake is doing a OOM at the end
>>
>> for the full report including config screenshots see this
>> https://svn.dd-wrt.com/ticket/6798#comment:14. it shows also the qos
>> setup which i can use to reproduce and to
>> print out the full tc ruleset if required (which it surelly is for you).
>> if you want i will recreate this setup and send the tc rules on this
>> list
> Yes, please do. The output of 'tc -s qdisc' would be useful as well to
> see how much memory CAKE itself thinks it's using...
>
> Are you setting the memory_limit in your config or relying on CAKE's
> default?
>
> -Toke
>
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Cake] cake memory consumption
  2019-09-17  5:31                       ` Sebastian Gottschall
@ 2019-09-17  9:21                         ` Jonathan Morton
  2019-09-17  9:55                           ` Sebastian Gottschall
  0 siblings, 1 reply; 36+ messages in thread
From: Jonathan Morton @ 2019-09-17  9:21 UTC (permalink / raw)
  To: Sebastian Gottschall; +Cc: Toke Høiland-Jørgensen, cake

> On 17 Sep, 2019, at 8:31 am, Sebastian Gottschall <s.gottschall@newmedia-net.de> wrote:
> 
> according to the output there is a flaw/bug in the memory limit calculation
> cake_reconfigure     may set buffer_limit to  ~0 if no rate is set.
> 
> the following line "min(buffer_limit, max(sch->limit * psched_mtu(qdisc_dec(sch), q->buffer_config_limit))" doesnt make it better since buffer_config_limit is not configured
> so we got a possible memory overuse here.

In C, ~0 means "as near to infinity as an unsigned integer can get", or effectively 4GB.  That construct is used to get that part of the calculation out of the way, so that it has no effect in the following nested max() and min() macros.

What actually happens here is that the "packet limit" property of the interface becomes governing, and is recalculated in terms of a byte count by multiplying it by the MTU.  So the limit configured for each Cake instance in your particular case is 15MB, corresponding to 10,000 packets:

>  memory used: 0b of 15140Kb

With so many Cake instances loaded (very much *not* the normal configuration!) and only 128MB total RAM, 15MB is obviously too high a limit to be completely safe - even though Cake's AQM action will keep the *average* queue depth well below that limit.

The correct fix here is not to change the code, but to use the memlimit parameter to override the default.  These unusual configurations, where the default logic breaks, are precisely why it was added.

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Cake] cake memory consumption
  2019-09-17  5:33                       ` Sebastian Gottschall
@ 2019-09-17  9:40                         ` Toke Høiland-Jørgensen
  2019-09-18  7:19                           ` Sebastian Gottschall
  0 siblings, 1 reply; 36+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-09-17  9:40 UTC (permalink / raw)
  To: Sebastian Gottschall, cake

Sebastian Gottschall <s.gottschall@newmedia-net.de> writes:

> more simple patch
>
> --- sch_cake.c  (revision 41051)
> +++ sch_cake.c  (working copy)
> @@ -2691,7 +2691,7 @@
>                  do_div(t, USEC_PER_SEC / 4);
>                  q->buffer_limit = max_t(u32, t, 4U << 20);
>          } else {
> -               q->buffer_limit = ~0;
> +               q->buffer_limit = 4U << 20;
>          }
>
>          sch->flags &= ~TCQ_F_CAN_BYPASS;

As Jonathan remarked, the right thing to do here is to use the
memory_limit parameter to set a different limit when you setup the tree.

Still, I count 35 instances of CAKE in your setup; even with a 4MB limit
apiece, that is a total of 140 MB of potential packet memory. You'd need
to set it as low as 1 or 2 MB to be completely sure that you won't run
out of memory if they are all full...

-Toke

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Cake] cake memory consumption
  2019-09-17  9:21                         ` Jonathan Morton
@ 2019-09-17  9:55                           ` Sebastian Gottschall
  0 siblings, 0 replies; 36+ messages in thread
From: Sebastian Gottschall @ 2019-09-17  9:55 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Toke Høiland-Jørgensen, cake


Am 17.09.2019 um 11:21 schrieb Jonathan Morton:
>> On 17 Sep, 2019, at 8:31 am, Sebastian Gottschall <s.gottschall@newmedia-net.de> wrote:
>>
>> according to the output there is a flaw/bug in the memory limit calculation
>> cake_reconfigure     may set buffer_limit to  ~0 if no rate is set.
>>
>> the following line "min(buffer_limit, max(sch->limit * psched_mtu(qdisc_dec(sch), q->buffer_config_limit))" doesnt make it better since buffer_config_limit is not configured
>> so we got a possible memory overuse here.
> In C, ~0 means "as near to infinity as an unsigned integer can get", or effectively 4GB.  That construct is used to get that part of the calculation out of the way, so that it has no effect in the following nested max() and min() macros.
>
> What actually happens here is that the "packet limit" property of the interface becomes governing, and is recalculated in terms of a byte count by multiplying it by the MTU.  So the limit configured for each Cake instance in your particular case is 15MB, corresponding to 10,000 packets:
>
>>   memory used: 0b of 15140Kb
> With so many Cake instances loaded (very much *not* the normal configuration!) and only 128MB total RAM, 15MB is obviously too high a limit to be completely safe - even though Cake's AQM action will keep the *average* queue depth well below that limit.
>
> The correct fix here is not to change the code, but to use the memlimit parameter to override the default.  These unusual configurations, where the default logic breaks, are precisely why it was added.
okay. so i will handle it custom in my code depending on the device memory
>
>   - Jonathan Morton
>
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Cake] cake memory consumption
  2019-09-17  9:40                         ` Toke Høiland-Jørgensen
@ 2019-09-18  7:19                           ` Sebastian Gottschall
  2019-09-18  9:53                             ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 36+ messages in thread
From: Sebastian Gottschall @ 2019-09-18  7:19 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, cake

the problem is. i tested restricting the memory to 4 mb. but it still 
runs oom. same memory consumption and from the qdisc show output i also 
see that just a few kilobytes are used in that pool.
so the problem with cake must be somewhere else. its not the buffer 
limit. i see values like memory used: 22176b of 4Mb which is really 
nothing. most qdiscs are 0 and unused in that setup

Am 17.09.2019 um 11:40 schrieb Toke Høiland-Jørgensen:
> Sebastian Gottschall <s.gottschall@newmedia-net.de> writes:
>
>> more simple patch
>>
>> --- sch_cake.c  (revision 41051)
>> +++ sch_cake.c  (working copy)
>> @@ -2691,7 +2691,7 @@
>>                   do_div(t, USEC_PER_SEC / 4);
>>                   q->buffer_limit = max_t(u32, t, 4U << 20);
>>           } else {
>> -               q->buffer_limit = ~0;
>> +               q->buffer_limit = 4U << 20;
>>           }
>>
>>           sch->flags &= ~TCQ_F_CAN_BYPASS;
> As Jonathan remarked, the right thing to do here is to use the
> memory_limit parameter to set a different limit when you setup the tree.
>
> Still, I count 35 instances of CAKE in your setup; even with a 4MB limit
> apiece, that is a total of 140 MB of potential packet memory. You'd need
> to set it as low as 1 or 2 MB to be completely sure that you won't run
> out of memory if they are all full...
>
> -Toke

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Cake] cake memory consumption
  2019-09-18  7:19                           ` Sebastian Gottschall
@ 2019-09-18  9:53                             ` Toke Høiland-Jørgensen
  2019-09-18  9:57                               ` Sebastian Gottschall
  0 siblings, 1 reply; 36+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-09-18  9:53 UTC (permalink / raw)
  To: Sebastian Gottschall, cake

Sebastian Gottschall <s.gottschall@newmedia-net.de> writes:

> the problem is. i tested restricting the memory to 4 mb. but it still 
> runs oom. same memory consumption and from the qdisc show output i also 
> see that just a few kilobytes are used in that pool.
> so the problem with cake must be somewhere else. its not the buffer 
> limit. i see values like memory used: 22176b of 4Mb which is really 
> nothing. most qdiscs are 0 and unused in that setup

Hmm, that does sound odd. Are you seeing the "total used memory" go up
as soon as you load the qdiscs (without any traffic)?

Does the memory drop down again if you clear the qdisc config and go
back to an fq_codel-based one?

-Toke


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Cake] cake memory consumption
  2019-09-18  9:53                             ` Toke Høiland-Jørgensen
@ 2019-09-18  9:57                               ` Sebastian Gottschall
  2019-09-18 10:22                                 ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 36+ messages in thread
From: Sebastian Gottschall @ 2019-09-18  9:57 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, cake


Am 18.09.2019 um 11:53 schrieb Toke Høiland-Jørgensen:
> Sebastian Gottschall <s.gottschall@newmedia-net.de> writes:
>
>> the problem is. i tested restricting the memory to 4 mb. but it still
>> runs oom. same memory consumption and from the qdisc show output i also
>> see that just a few kilobytes are used in that pool.
>> so the problem with cake must be somewhere else. its not the buffer
>> limit. i see values like memory used: 22176b of 4Mb which is really
>> nothing. most qdiscs are 0 and unused in that setup
> Hmm, that does sound odd. Are you seeing the "total used memory" go up
> as soon as you load the qdiscs (without any traffic)?
without traffic nothing happens. so it grows only on traffic base.
>
> Does the memory drop down again if you clear the qdisc config and go
> back to an fq_codel-based one?
according to the reporter yes. not sure. maybe its just a issue withe 
the out of tree cake variant on that specific kernel. need todo more 
research here
>
> -Toke
>
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Cake] cake memory consumption
  2019-09-18  9:57                               ` Sebastian Gottschall
@ 2019-09-18 10:22                                 ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 36+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-09-18 10:22 UTC (permalink / raw)
  To: Sebastian Gottschall, cake

Sebastian Gottschall <s.gottschall@newmedia-net.de> writes:

> Am 18.09.2019 um 11:53 schrieb Toke Høiland-Jørgensen:
>> Sebastian Gottschall <s.gottschall@newmedia-net.de> writes:
>>
>>> the problem is. i tested restricting the memory to 4 mb. but it still
>>> runs oom. same memory consumption and from the qdisc show output i also
>>> see that just a few kilobytes are used in that pool.
>>> so the problem with cake must be somewhere else. its not the buffer
>>> limit. i see values like memory used: 22176b of 4Mb which is really
>>> nothing. most qdiscs are 0 and unused in that setup
>> Hmm, that does sound odd. Are you seeing the "total used memory" go up
>> as soon as you load the qdiscs (without any traffic)?
> without traffic nothing happens. so it grows only on traffic base.
>>
>> Does the memory drop down again if you clear the qdisc config and go
>> back to an fq_codel-based one?
> according to the reporter yes. not sure. maybe its just a issue withe 
> the out of tree cake variant on that specific kernel. need todo more 
> research here

Yeah, that does sound decidedly odd. We really are only allocating a few
hundred k of memory on init, so if the memory usage jumps immediately
there's something fishy going on somewhere...

-Toke


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Cake] Fighting bloat in the face of uncertinty
  2019-09-08 17:27             ` Jonathan Morton
  2019-09-16 10:21               ` [Cake] cake memory consumption Sebastian Gottschall
@ 2019-10-03 17:52               ` Justin Kilpatrick
  2019-10-03 18:41                 ` Dave Taht
  2019-10-03 19:04                 ` Jonathan Morton
  1 sibling, 2 replies; 36+ messages in thread
From: Justin Kilpatrick @ 2019-10-03 17:52 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: cake

I've developed a rough version of this and put it into production Monday. After a few tweaks we're seeing a ~10x reduction in the magnitude of latency spikes at high usage times. 

https://github.com/althea-net/althea_rs/blob/master/rita/src/rita_common/network_monitor/mod.rs#L288

The average and standard deviation of latency to a given neighbor is scraped from Babel and when the standard deviation exceeds 10x  the average we reduce the throughput of the connection by 20%.

It's not theoretically sound yet because I still need to expose single direction latency in Babel rather than only round trip. Bloat caused by the other side of the link currently causes connections to be reduced all the way down to the throughput minimum unnecessarily. 

It would also be advantageous to observe what throughput we've recorded for the last 5 seconds and put a threshold there. Rather than doing any probing ourselves we can just observe if the user was saturating the connection or if it was a transient radio problem. 

If anyone else is interested in using this I can split it off from our application and into a stand alone (if somewhat bulky) binary without much trouble. 

-- 
  Justin Kilpatrick
  justin@althea.net

On Sun, Sep 8, 2019, at 1:27 PM, Jonathan Morton wrote:
> >> You could also set it back to 'internet' and progressively reduce the 
> >> bandwidth parameter, making the Cake shaper into the actual bottleneck. 
> >> This is the correct fix for the problem, and you should notice an 
> >> instant improvement as soon as the bandwidth parameter is correct.
> > 
> > Hand tuning this one link is not a problem. I'm searching for a set of settings that will provide generally good performance across a wide range of devices, links, and situations. 
> > 
> > From what you've indicated so far there's nothing as effective as a correct bandwidth estimation if we consider the antenna (link) a black box. Expecting the user to input expected throughput for every link and then managing that information is essentially a non-starter. 
> > 
> > Radio tuning provides some improvement, but until ubiquiti starts shipping with Codel on non-router devices I don't think there's a good solution here. 
> > 
> > Any way to have the receiving device detect bloat and insert an ECN?
> 
> That's what the qdisc itself is supposed to do.
> 
> > I don't think the time spent in the intermediate device is detectable at the kernel level but we keep track of latency for routing decisions and could detect bloat with some accuracy, the problem is how to respond.
> 
> As long as you can detect which link the bloat is on (and in which 
> direction), you can respond by reducing the bandwidth parameter on that 
> half-link by a small amount.  Since you have a cooperating network, 
> maintaining a time standard on each node sufficient to observe one-way 
> delays seems feasible, as is establishing a normal baseline latency for 
> each link.
> 
> The characteristics of the bandwidth parameter being too high are easy 
> to observe.  Not only will the one-way delay go up, but the received 
> throughput in the same direction at the same time will be lower than 
> configured.  You might use the latter as a hint as to how far you need 
> to reduce the shaped bandwidth.
> 
> Deciding when and by how much to *increase* bandwidth, which is 
> presumably desirable when link conditions improve, is a more difficult 
> problem when the link hardware doesn't cooperate by informing you of 
> its status.  (This is something you could reasonably ask Ubiquiti to 
> address.)
> 
> I would assume that link characteristics will change slowly, and run an 
> occasional explicit bandwidth probe to see if spare bandwidth is 
> available.  If that probe comes through without exhibiting bloat, *and* 
> the link is otherwise loaded to capacity, then increase the shaper by 
> an amount within the probe's capacity of measurement - and schedule a 
> repeat.
> 
> A suitable probe might be 100x 1500b packets paced out over a second, 
> bypassing the shaper.  This will occupy just over 1Mbps of bandwidth, 
> and can be expected to induce 10ms of delay if injected into a 
> saturated 100Mbps link.  Observe the delay experienced by each packet 
> *and* the quantity of other traffic that appears between them.  Only if 
> both are favourable can you safely open the shaper, by 1Mbps.
> 
> Since wireless links can be expected to change their capacity over 
> time, due to eg. weather and tree growth, this seems to be more 
> generally useful than a static guess.  You could deploy a new link with 
> a conservative "guess" of say 10Mbps, and just probe from there.
> 
>  - Jonathan Morton

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Cake] Fighting bloat in the face of uncertinty
  2019-10-03 17:52               ` [Cake] Fighting bloat in the face of uncertinty Justin Kilpatrick
@ 2019-10-03 18:41                 ` Dave Taht
  2019-10-03 19:04                 ` Jonathan Morton
  1 sibling, 0 replies; 36+ messages in thread
From: Dave Taht @ 2019-10-03 18:41 UTC (permalink / raw)
  To: Justin Kilpatrick; +Cc: Jonathan Morton, Cake List

Heh. We need a t-shirt...

from TunnelManager::from_registry().do_send(GotBloat {

..

GotBloat() ? more_fq_codel : fq_codel;


On Thu, Oct 3, 2019 at 10:52 AM Justin Kilpatrick <justin@althea.net> wrote:
>
> I've developed a rough version of this and put it into production Monday. After a few tweaks we're seeing a ~10x reduction in the magnitude of latency spikes at high usage times.
>
> https://github.com/althea-net/althea_rs/blob/master/rita/src/rita_common/network_monitor/mod.rs#L288
>
> The average and standard deviation of latency to a given neighbor is scraped from Babel and when the standard deviation exceeds 10x  the average we reduce the throughput of the connection by 20%.
>
> It's not theoretically sound yet because I still need to expose single direction latency in Babel rather than only round trip. Bloat caused by the other side of the link currently causes connections to be reduced all the way down to the throughput minimum unnecessarily.
>
> It would also be advantageous to observe what throughput we've recorded for the last 5 seconds and put a threshold there. Rather than doing any probing ourselves we can just observe if the user was saturating the connection or if it was a transient radio problem.
>
> If anyone else is interested in using this I can split it off from our application and into a stand alone (if somewhat bulky) binary without much trouble.
>
> --
>   Justin Kilpatrick
>   justin@althea.net
>
> On Sun, Sep 8, 2019, at 1:27 PM, Jonathan Morton wrote:
> > >> You could also set it back to 'internet' and progressively reduce the
> > >> bandwidth parameter, making the Cake shaper into the actual bottleneck.
> > >> This is the correct fix for the problem, and you should notice an
> > >> instant improvement as soon as the bandwidth parameter is correct.
> > >
> > > Hand tuning this one link is not a problem. I'm searching for a set of settings that will provide generally good performance across a wide range of devices, links, and situations.
> > >
> > > From what you've indicated so far there's nothing as effective as a correct bandwidth estimation if we consider the antenna (link) a black box. Expecting the user to input expected throughput for every link and then managing that information is essentially a non-starter.
> > >
> > > Radio tuning provides some improvement, but until ubiquiti starts shipping with Codel on non-router devices I don't think there's a good solution here.
> > >
> > > Any way to have the receiving device detect bloat and insert an ECN?
> >
> > That's what the qdisc itself is supposed to do.
> >
> > > I don't think the time spent in the intermediate device is detectable at the kernel level but we keep track of latency for routing decisions and could detect bloat with some accuracy, the problem is how to respond.
> >
> > As long as you can detect which link the bloat is on (and in which
> > direction), you can respond by reducing the bandwidth parameter on that
> > half-link by a small amount.  Since you have a cooperating network,
> > maintaining a time standard on each node sufficient to observe one-way
> > delays seems feasible, as is establishing a normal baseline latency for
> > each link.
> >
> > The characteristics of the bandwidth parameter being too high are easy
> > to observe.  Not only will the one-way delay go up, but the received
> > throughput in the same direction at the same time will be lower than
> > configured.  You might use the latter as a hint as to how far you need
> > to reduce the shaped bandwidth.
> >
> > Deciding when and by how much to *increase* bandwidth, which is
> > presumably desirable when link conditions improve, is a more difficult
> > problem when the link hardware doesn't cooperate by informing you of
> > its status.  (This is something you could reasonably ask Ubiquiti to
> > address.)
> >
> > I would assume that link characteristics will change slowly, and run an
> > occasional explicit bandwidth probe to see if spare bandwidth is
> > available.  If that probe comes through without exhibiting bloat, *and*
> > the link is otherwise loaded to capacity, then increase the shaper by
> > an amount within the probe's capacity of measurement - and schedule a
> > repeat.
> >
> > A suitable probe might be 100x 1500b packets paced out over a second,
> > bypassing the shaper.  This will occupy just over 1Mbps of bandwidth,
> > and can be expected to induce 10ms of delay if injected into a
> > saturated 100Mbps link.  Observe the delay experienced by each packet
> > *and* the quantity of other traffic that appears between them.  Only if
> > both are favourable can you safely open the shaper, by 1Mbps.
> >
> > Since wireless links can be expected to change their capacity over
> > time, due to eg. weather and tree growth, this seems to be more
> > generally useful than a static guess.  You could deploy a new link with
> > a conservative "guess" of say 10Mbps, and just probe from there.
> >
> >  - Jonathan Morton
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake



-- 

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Cake] Fighting bloat in the face of uncertinty
  2019-10-03 17:52               ` [Cake] Fighting bloat in the face of uncertinty Justin Kilpatrick
  2019-10-03 18:41                 ` Dave Taht
@ 2019-10-03 19:04                 ` Jonathan Morton
  1 sibling, 0 replies; 36+ messages in thread
From: Jonathan Morton @ 2019-10-03 19:04 UTC (permalink / raw)
  To: Justin Kilpatrick; +Cc: cake

> On 3 Oct, 2019, at 8:52 pm, Justin Kilpatrick <justin@althea.net> wrote:
> 
> I've developed a rough version of this and put it into production Monday. After a few tweaks we're seeing a ~10x reduction in the magnitude of latency spikes at high usage times. 

Sounds promising.  Keep it up!

 - Jonathan Morton

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2019-10-03 19:04 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-07 22:42 [Cake] Fighting bloat in the face of uncertinty Justin Kilpatrick
2019-09-07 23:09 ` Jonathan Morton
2019-09-07 23:31   ` Justin Kilpatrick
2019-09-07 23:42     ` Jonathan Morton
2019-09-08  0:03       ` Justin Kilpatrick
2019-09-08  0:59         ` Jonathan Morton
2019-09-08 14:29           ` Justin Kilpatrick
2019-09-08 17:27             ` Jonathan Morton
2019-09-16 10:21               ` [Cake] cake memory consumption Sebastian Gottschall
2019-09-16 12:00                 ` Dave Taht
2019-09-16 12:51                   ` Dave Taht
2019-09-16 13:31                     ` Sebastian Gottschall
2019-09-16 13:22                   ` Sebastian Gottschall
2019-09-16 13:28                     ` Justin Kilpatrick
2019-09-16 13:39                       ` Jonathan Morton
2019-09-16 13:54                         ` Sebastian Gottschall
2019-09-16 14:06                           ` Jonathan Morton
2019-09-17  5:10                             ` Sebastian Gottschall
2019-09-16 13:47                       ` Sebastian Gottschall
2019-09-16 12:08                 ` Toke Høiland-Jørgensen
2019-09-16 13:25                   ` Sebastian Gottschall
2019-09-16 14:01                     ` Toke Høiland-Jørgensen
2019-09-17  5:06                       ` Sebastian Gottschall
2019-09-17  5:21                       ` Sebastian Gottschall
2019-09-17  5:31                       ` Sebastian Gottschall
2019-09-17  9:21                         ` Jonathan Morton
2019-09-17  9:55                           ` Sebastian Gottschall
2019-09-17  5:33                       ` Sebastian Gottschall
2019-09-17  9:40                         ` Toke Høiland-Jørgensen
2019-09-18  7:19                           ` Sebastian Gottschall
2019-09-18  9:53                             ` Toke Høiland-Jørgensen
2019-09-18  9:57                               ` Sebastian Gottschall
2019-09-18 10:22                                 ` Toke Høiland-Jørgensen
2019-10-03 17:52               ` [Cake] Fighting bloat in the face of uncertinty Justin Kilpatrick
2019-10-03 18:41                 ` Dave Taht
2019-10-03 19:04                 ` Jonathan Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox