* [Cake] Trouble with CAKE
@ 2019-12-13 13:43 Thibaut
2019-12-13 14:02 ` Jonathan Morton
` (3 more replies)
0 siblings, 4 replies; 20+ messages in thread
From: Thibaut @ 2019-12-13 13:43 UTC (permalink / raw)
To: cake
[-- Attachment #1: Type: text/plain, Size: 2601 bytes --]
Hi list,
I've been using CAKE on my DSL-connected Linux router for the last few years, and it worked well until very recently. Two things happened:
1) My ISP (French "Free") switched my DSLAM to native IPv6, which for the time being means that I had to revert to using their set-top-box (Freebox) instead of the VDSL2 model I was using in bridge mode until then (CAKE in "bridged-ptm ether-vlan" mode)
2) I upgraded my router from 3.16 (Devuan Jessie) to 4.9 (Devuan ASCII)
Since then, no matter which setup I use, I cannot get CAKE to work as intended. Specifically, any long-standing best effort stream (such as a remote rsync) will be throttled to a near grinding halt even though there is no other significant traffic going on. Some random bursts can be seen (with iftop) but nothing ever gets close to half the maximum bandwidth. This is notably affecting the OpenWRT buildbots I'm hosting on this link.
In details:
$ uname -a
Linux rapid-ts1 4.9.0-11-686 #1 SMP Debian 4.9.189-3+deb9u2 (2019-11-11) i686 GNU/Linux
Cake commit: 183b320 RFC 8622 diffserv3, 4 & 8 LE PHB support
cake setup on the wan iface: bandwidth 1Mbit diffserv3 dual-srchost nat nowash ack-filter split-gso bridged-vcmux no-sce
the available ATM uplink bandwith is 1.2Mbps, I tried going as low as 700kbps, disabling ack-filter and setting "conservative" to see if it would make a difference, it wouldn't in any significant way: the upload would still be severely throttled. I also tried disabling the ingress leg to get that out of the equation: also no difference.
As I broke rule #1 of any setup upgrade (by changing both the link - VDSL to ADSL - and the running kernel), I can't tell for sure where the fault lies; however I must add something about the "native IPv6 DSLAM" bit:
Free uses map-e/map-t, i.e. ipip6 tunnels on its native v6 DSLAMs. The Freebox still offers a public IPv4 to the connected router, but inside the Freebox there is an ipip6 tunnel setup to encapsulate the IPv4 traffic into IPv6, a tunnel over which I have no control. I wonder if this encapsulation and its associated overhead could be throwing CAKE computations off? FWIW, my router now operates in dual-stack mode, with both a public IPv4 and a public IPv6 (although for the time being my LAN remains IPv4 only).
I haven't (yet) found a way to connect directly to the DSLAM without the Freebox (using my VDSL modem as I did before), so I can't get around this particular blackbox.
I hope this provides enough detail, I'm happy to expand as needed: I would really want my CAKE back :)
Cheers,
Thibaut
[-- Attachment #2: Type: text/html, Size: 2935 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Cake] Trouble with CAKE
2019-12-13 13:43 [Cake] Trouble with CAKE Thibaut
@ 2019-12-13 14:02 ` Jonathan Morton
2019-12-13 22:39 ` Thibaut
2019-12-13 14:13 ` Thibaut
` (2 subsequent siblings)
3 siblings, 1 reply; 20+ messages in thread
From: Jonathan Morton @ 2019-12-13 14:02 UTC (permalink / raw)
To: Thibaut; +Cc: cake
> On 13 Dec, 2019, at 3:43 pm, Thibaut <hacks@slashdirt.org> wrote:
>
> I've been using CAKE on my DSL-connected Linux router for the last few years, and it worked well until very recently. Two things happened:
>
> 1) My ISP (French "Free") switched my DSLAM to native IPv6, which for the time being means that I had to revert to using their set-top-box (Freebox) instead of the VDSL2 model I was using in bridge mode until then (CAKE in "bridged-ptm ether-vlan" mode)
> 2) I upgraded my router from 3.16 (Devuan Jessie) to 4.9 (Devuan ASCII)
>
> Since then, no matter which setup I use, I cannot get CAKE to work as intended. Specifically, any long-standing best effort stream (such as a remote rsync) will be throttled to a near grinding halt even though there is no other significant traffic going on. Some random bursts can be seen (with iftop) but nothing ever gets close to half the maximum bandwidth. This is notably affecting the OpenWRT buildbots I'm hosting on this link.
Old kernels, including 4.9 series, tend to be more problematic than the latest ones. If you can, I would recommend updating to a 5.x series kernel, in which Cake is an upstream feature. I won't presume to guess how best to achieve that with your distro.
The good news is that Free.fr is among the relatively few ISPs who have actively tackled bufferbloat themselves. As a workaround while you sort this out, you should get reasonable performance just from using the Freebox directly.
- Jonathan Morton
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Cake] Trouble with CAKE
2019-12-13 13:43 [Cake] Trouble with CAKE Thibaut
2019-12-13 14:02 ` Jonathan Morton
@ 2019-12-13 14:13 ` Thibaut
2019-12-13 14:15 ` Sebastian Moeller
2019-12-13 14:21 ` Thibaut
3 siblings, 0 replies; 20+ messages in thread
From: Thibaut @ 2019-12-13 14:13 UTC (permalink / raw)
To: Jonathan Morton; +Cc: cake
Hi Jonathan,
Thanks for the quick reply.
December 13, 2019 3:02 PM, "Jonathan Morton" <chromatix99@gmail.com> wrote:
>> On 13 Dec, 2019, at 3:43 pm, Thibaut <hacks@slashdirt.org> wrote:
>>
>> I've been using CAKE on my DSL-connected Linux router for the last few years, and it worked well
>> until very recently. Two things happened:
>>
>> 1) My ISP (French "Free") switched my DSLAM to native IPv6, which for the time being means that I
>> had to revert to using their set-top-box (Freebox) instead of the VDSL2 model I was using in bridge
>> mode until then (CAKE in "bridged-ptm ether-vlan" mode)
>> 2) I upgraded my router from 3.16 (Devuan Jessie) to 4.9 (Devuan ASCII)
>>
>> Since then, no matter which setup I use, I cannot get CAKE to work as intended. Specifically, any
>> long-standing best effort stream (such as a remote rsync) will be throttled to a near grinding halt
>> even though there is no other significant traffic going on. Some random bursts can be seen (with
>> iftop) but nothing ever gets close to half the maximum bandwidth. This is notably affecting the
>> OpenWRT buildbots I'm hosting on this link.
>
> Old kernels, including 4.9 series, tend to be more problematic than the latest ones. If you can, I
> would recommend updating to a 5.x series kernel, in which Cake is an upstream feature. I won't
> presume to guess how best to achieve that with your distro.
Indeed. Given this is a relatively "security-sensitive" setup, I'd rather not diverge from distro (or run "unstable" for that matter).
Still, CAKE was previously working Just Fine(TM) on an even older kernel: 3.16...
> The good news is that Free.fr is among the relatively few ISPs who have actively tackled
> bufferbloat themselves. As a workaround while you sort this out, you should get reasonable
> performance just from using the Freebox directly.
Well, probably not with the antiquated Freebox model I have: a v5. I would also very much like to drop it to be able to connect again at VDSL2 speeds (I was getting 50/10 instead of the current 20/1) and save 75% on power usage (the Freebox is power hungry). But that's orthogonal to the current issue :)
What I could do for a test is temporarily revert back to the previous 3.16/Jessie setup (I have kept a backup) and see if I can reproduce the same behavior there, in which case I think that would definitely point in the general direction of a bad interaction with the Freebox tunnel?
Best,
Thibaut
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Cake] Trouble with CAKE
2019-12-13 13:43 [Cake] Trouble with CAKE Thibaut
2019-12-13 14:02 ` Jonathan Morton
2019-12-13 14:13 ` Thibaut
@ 2019-12-13 14:15 ` Sebastian Moeller
2019-12-13 14:21 ` Thibaut
3 siblings, 0 replies; 20+ messages in thread
From: Sebastian Moeller @ 2019-12-13 14:15 UTC (permalink / raw)
To: Thibaut; +Cc: cake
Hi Thibaut,
so ADSL is both special and precious, may I recommend to follow the instructions on https://github.com/moeller0/ATM_overhead_detector? This will either confirm the overhead settings, or more likely "explode" if the freeebox truely tunnels all IPv4 data through IPv6. In both cases the results should be interesting. As a quick test, what is the textual output from the "Share Your Results" box on https://www.speedguide.net/analyzer.php?
I would not be amazed if the issue might be related to having a whoping 40 bytes more of unaccounted for per-packet-overhead (in combination with ATM/AAL5's lovely per packet padding). But that might all be moot if it is/was caused by a kernel issue.
Best Regards
Sebastian
> On Dec 13, 2019, at 14:43, Thibaut <hacks@slashdirt.org> wrote:
>
> Hi list,
>
> I've been using CAKE on my DSL-connected Linux router for the last few years, and it worked well until very recently. Two things happened:
>
> 1) My ISP (French "Free") switched my DSLAM to native IPv6, which for the time being means that I had to revert to using their set-top-box (Freebox) instead of the VDSL2 model I was using in bridge mode until then (CAKE in "bridged-ptm ether-vlan" mode)
> 2) I upgraded my router from 3.16 (Devuan Jessie) to 4.9 (Devuan ASCII)
>
> Since then, no matter which setup I use, I cannot get CAKE to work as intended. Specifically, any long-standing best effort stream (such as a remote rsync) will be throttled to a near grinding halt even though there is no other significant traffic going on. Some random bursts can be seen (with iftop) but nothing ever gets close to half the maximum bandwidth. This is notably affecting the OpenWRT buildbots I'm hosting on this link.
>
> In details:
>
> $ uname -a
> Linux rapid-ts1 4.9.0-11-686 #1 SMP Debian 4.9.189-3+deb9u2 (2019-11-11) i686 GNU/Linux
>
> Cake commit: 183b320 RFC 8622 diffserv3, 4 & 8 LE PHB support
>
> cake setup on the wan iface: bandwidth 1Mbit diffserv3 dual-srchost nat nowash ack-filter split-gso bridged-vcmux no-sce
> the available ATM uplink bandwith is 1.2Mbps, I tried going as low as 700kbps, disabling ack-filter and setting "conservative" to see if it would make a difference, it wouldn't in any significant way: the upload would still be severely throttled. I also tried disabling the ingress leg to get that out of the equation: also no difference.
>
> As I broke rule #1 of any setup upgrade (by changing both the link - VDSL to ADSL - and the running kernel), I can't tell for sure where the fault lies; however I must add something about the "native IPv6 DSLAM" bit:
>
> Free uses map-e/map-t, i.e. ipip6 tunnels on its native v6 DSLAMs. The Freebox still offers a public IPv4 to the connected router, but inside the Freebox there is an ipip6 tunnel setup to encapsulate the IPv4 traffic into IPv6, a tunnel over which I have no control. I wonder if this encapsulation and its associated overhead could be throwing CAKE computations off? FWIW, my router now operates in dual-stack mode, with both a public IPv4 and a public IPv6 (although for the time being my LAN remains IPv4 only).
>
> I haven't (yet) found a way to connect directly to the DSLAM without the Freebox (using my VDSL modem as I did before), so I can't get around this particular blackbox.
>
> I hope this provides enough detail, I'm happy to expand as needed: I would really want my CAKE back :)
>
> Cheers,
> Thibaut
> _______________________________________________
> Cake mailing list
> Cake@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cake
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Cake] Trouble with CAKE
2019-12-13 13:43 [Cake] Trouble with CAKE Thibaut
` (2 preceding siblings ...)
2019-12-13 14:15 ` Sebastian Moeller
@ 2019-12-13 14:21 ` Thibaut
2019-12-13 18:44 ` Thibaut
3 siblings, 1 reply; 20+ messages in thread
From: Thibaut @ 2019-12-13 14:21 UTC (permalink / raw)
To: Sebastian Moeller; +Cc: cake
Hi Sebastian,
December 13, 2019 3:15 PM, "Sebastian Moeller" <moeller0@gmx.de> wrote:
> Hi Thibaut,
>
> so ADSL is both special and precious, may I recommend to follow the instructions on
> https://github.com/moeller0/ATM_overhead_detector?
I will give it a try.
> This will either confirm the overhead settings,
> or more likely "explode" if the freeebox truely tunnels all IPv4 data through IPv6.
It does, this is a confirmed fact (and in fact trying to connect with my VDSL modem the DSLAM only speaks IPv6: I could tcpdump encapsulated frames from the BR containing IPv4 data that were destined to other subscribers - an unpleasant side effect of map-t "shared IPv4": by default free splits every IPv4 between 4 subscriber (by splitting the port range). You have to actively ask for a "full stack IP" to turn that off).
> In both cases
> the results should be interesting. As a quick test, what is the textual output from the "Share Your
> Results" box on https://www.speedguide.net/analyzer.php?
I'll report when the buildslave is done uploading :)
> I would not be amazed if the issue might be related to having a whoping 40 bytes more of
> unaccounted for per-packet-overhead (in combination with ATM/AAL5's lovely per packet padding). But
> that might all be moot if it is/was caused by a kernel issue.
*nod*
Thanks,
Thibaut
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Cake] Trouble with CAKE
2019-12-13 14:21 ` Thibaut
@ 2019-12-13 18:44 ` Thibaut
0 siblings, 0 replies; 20+ messages in thread
From: Thibaut @ 2019-12-13 18:44 UTC (permalink / raw)
To: Sebastian Moeller; +Cc: cake
[-- Attachment #1: Type: text/plain, Size: 1162 bytes --]
Hi Sebastian,
> On 13 Dec 2019, at 15:21, Thibaut <hacks@slashdirt.org> wrote:
>
> Hi Sebastian,
>
> December 13, 2019 3:15 PM, "Sebastian Moeller" <moeller0@gmx.de> wrote:
>
>> Hi Thibaut,
>>
>> so ADSL is both special and precious, may I recommend to follow the instructions on
>> https://github.com/moeller0/ATM_overhead_detector?
>
> I will give it a try.
I’ll confess being a bit lazy as I didn’t go all the way up to parsing with Octave (which I’m not familiar with), but the output file is here:
http://vps.slashdirt.org/~varenet/ping_sweep__20191213_170053.txt.gz <http://vps.slashdirt.org/~varenet/ping_sweep__20191213_170053.txt.gz> (it’s 2.1M compressed)
>> In both cases
>> the results should be interesting. As a quick test, what is the textual output from the "Share Your
>> Results" box on https://www.speedguide.net/analyzer.php?
>
> I'll report when the buildslave is done uploading :)
As it turns out, it appears this website requires a fully fledged browser, but this is a remote headless setup I’m dealing with here: is there an alternative that can be CLI-friendly?
I hope this helps,
Thibaut
[-- Attachment #2: Type: text/html, Size: 2247 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Cake] Trouble with CAKE
2019-12-13 14:02 ` Jonathan Morton
@ 2019-12-13 22:39 ` Thibaut
2019-12-13 22:40 ` Thibaut
0 siblings, 1 reply; 20+ messages in thread
From: Thibaut @ 2019-12-13 22:39 UTC (permalink / raw)
To: Jonathan Morton; +Cc: cake
Hi Jonathan,
> On 13 Dec 2019, at 15:02, Jonathan Morton <chromatix99@gmail.com> wrote:
>
>> On 13 Dec, 2019, at 3:43 pm, Thibaut <hacks@slashdirt.org> wrote:
>>
>> I've been using CAKE on my DSL-connected Linux router for the last few years, and it worked well until very recently. Two things happened:
>>
>> 1) My ISP (French "Free") switched my DSLAM to native IPv6, which for the time being means that I had to revert to using their set-top-box (Freebox) instead of the VDSL2 model I was using in bridge mode until then (CAKE in "bridged-ptm ether-vlan" mode)
>> 2) I upgraded my router from 3.16 (Devuan Jessie) to 4.9 (Devuan ASCII)
>>
>> Since then, no matter which setup I use, I cannot get CAKE to work as intended. Specifically, any long-standing best effort stream (such as a remote rsync) will be throttled to a near grinding halt even though there is no other significant traffic going on. Some random bursts can be seen (with iftop) but nothing ever gets close to half the maximum bandwidth. This is notably affecting the OpenWRT buildbots I'm hosting on this link.
>
> Old kernels, including 4.9 series, tend to be more problematic than the latest ones. If you can, I would recommend updating to a 5.x series kernel, in which Cake is an upstream feature. I won't presume to guess how best to achieve that with your distro.
I’m now able to confirm this looks like a regression: I was able to retrieve and build the last known working version of CAKE on my router, and with an adjusted overhead of 48 atm (confirmed thanks to the help of Stephan), it works like a charm. Current HEAD doesn’t, with the exact same parameters.
I’ll try to bisect, see if I can isolate the culprit.
HTH,
Thibaut
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Cake] Trouble with CAKE
2019-12-13 22:39 ` Thibaut
@ 2019-12-13 22:40 ` Thibaut
2019-12-13 23:52 ` Thibaut
0 siblings, 1 reply; 20+ messages in thread
From: Thibaut @ 2019-12-13 22:40 UTC (permalink / raw)
To: Jonathan Morton; +Cc: Erik Taraldsen via Cake
[-- Attachment #1: Type: text/plain, Size: 1867 bytes --]
> On 13 Dec 2019, at 23:39, Thibaut <hacks@slashdirt.org> wrote:
>
> Hi Jonathan,
>
>> On 13 Dec 2019, at 15:02, Jonathan Morton <chromatix99@gmail.com> wrote:
>>
>>> On 13 Dec, 2019, at 3:43 pm, Thibaut <hacks@slashdirt.org> wrote:
>>>
>>> I've been using CAKE on my DSL-connected Linux router for the last few years, and it worked well until very recently. Two things happened:
>>>
>>> 1) My ISP (French "Free") switched my DSLAM to native IPv6, which for the time being means that I had to revert to using their set-top-box (Freebox) instead of the VDSL2 model I was using in bridge mode until then (CAKE in "bridged-ptm ether-vlan" mode)
>>> 2) I upgraded my router from 3.16 (Devuan Jessie) to 4.9 (Devuan ASCII)
>>>
>>> Since then, no matter which setup I use, I cannot get CAKE to work as intended. Specifically, any long-standing best effort stream (such as a remote rsync) will be throttled to a near grinding halt even though there is no other significant traffic going on. Some random bursts can be seen (with iftop) but nothing ever gets close to half the maximum bandwidth. This is notably affecting the OpenWRT buildbots I'm hosting on this link.
>>
>> Old kernels, including 4.9 series, tend to be more problematic than the latest ones. If you can, I would recommend updating to a 5.x series kernel, in which Cake is an upstream feature. I won't presume to guess how best to achieve that with your distro.
>
> I’m now able to confirm this looks like a regression: I was able to retrieve and build the last known working version of CAKE on my router, and with an adjusted overhead of 48 atm (confirmed thanks to the help of Stephan), it works like a charm. Current HEAD doesn’t, with the exact same parameters.
I meant Sebastian, sorry. And this is old CAKE running on distro 4.9, for the sake of clarity
Thibaut
[-- Attachment #2: Type: text/html, Size: 5110 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Cake] Trouble with CAKE
2019-12-13 22:40 ` Thibaut
@ 2019-12-13 23:52 ` Thibaut
2019-12-14 9:50 ` Jonathan Morton
0 siblings, 1 reply; 20+ messages in thread
From: Thibaut @ 2019-12-13 23:52 UTC (permalink / raw)
To: Jonathan Morton; +Cc: Erik Taraldsen via Cake, ldir
[-- Attachment #1: Type: text/plain, Size: 2201 bytes --]
> Le 13 déc. 2019 à 23:40, Thibaut <hacks@slashdirt.org> a écrit :
>
>
>
>> On 13 Dec 2019, at 23:39, Thibaut <hacks@slashdirt.org <mailto:hacks@slashdirt.org>> wrote:
>>
>> Hi Jonathan,
>>
>>> On 13 Dec 2019, at 15:02, Jonathan Morton <chromatix99@gmail.com <mailto:chromatix99@gmail.com>> wrote:
>>>
>>>> On 13 Dec, 2019, at 3:43 pm, Thibaut <hacks@slashdirt.org <mailto:hacks@slashdirt.org>> wrote:
>>>>
>>>> I've been using CAKE on my DSL-connected Linux router for the last few years, and it worked well until very recently. Two things happened:
>>>>
>>>> 1) My ISP (French "Free") switched my DSLAM to native IPv6, which for the time being means that I had to revert to using their set-top-box (Freebox) instead of the VDSL2 model I was using in bridge mode until then (CAKE in "bridged-ptm ether-vlan" mode)
>>>> 2) I upgraded my router from 3.16 (Devuan Jessie) to 4.9 (Devuan ASCII)
>>>>
>>>> Since then, no matter which setup I use, I cannot get CAKE to work as intended. Specifically, any long-standing best effort stream (such as a remote rsync) will be throttled to a near grinding halt even though there is no other significant traffic going on. Some random bursts can be seen (with iftop) but nothing ever gets close to half the maximum bandwidth. This is notably affecting the OpenWRT buildbots I'm hosting on this link.
>>>
>>> Old kernels, including 4.9 series, tend to be more problematic than the latest ones. If you can, I would recommend updating to a 5.x series kernel, in which Cake is an upstream feature. I won't presume to guess how best to achieve that with your distro.
>>
>> I’m now able to confirm this looks like a regression: I was able to retrieve and build the last known working version of CAKE on my router, and with an adjusted overhead of 48 atm (confirmed thanks to the help of Stephan), it works like a charm. Current HEAD doesn’t, with the exact same parameters.
>
> I meant Sebastian, sorry. And this is old CAKE running on distro 4.9, for the sake of clarity
Culprit turned out to be easy to identify: it’s the current master HEAD.
Reverting 183b320 fixed the issue.
I hope this helps,
Thibaut
[-- Attachment #2: Type: text/html, Size: 5907 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Cake] Trouble with CAKE
2019-12-13 23:52 ` Thibaut
@ 2019-12-14 9:50 ` Jonathan Morton
2019-12-14 10:01 ` Thibaut
0 siblings, 1 reply; 20+ messages in thread
From: Jonathan Morton @ 2019-12-14 9:50 UTC (permalink / raw)
To: Thibaut; +Cc: Erik Taraldsen via Cake, ldir
> On 14 Dec, 2019, at 1:52 am, Thibaut <hacks@slashdirt.org> wrote:
>
> Culprit turned out to be easy to identify: it’s the current master HEAD.
>
> Reverting 183b320 fixed the issue.
That's extremely odd. That commit should only affect traffic carrying the LE DSCP, which is not the default.
Perhaps it was not actually the code change, but triggering a rebuild of the module?
- Jonathan Morton
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Cake] Trouble with CAKE
2019-12-14 9:50 ` Jonathan Morton
@ 2019-12-14 10:01 ` Thibaut
2019-12-14 10:35 ` Kevin 'ldir' Darbyshire-Bryant
0 siblings, 1 reply; 20+ messages in thread
From: Thibaut @ 2019-12-14 10:01 UTC (permalink / raw)
To: Jonathan Morton; +Cc: Erik Taraldsen via Cake, ldir
> On 14 Dec 2019, at 10:50, Jonathan Morton <chromatix99@gmail.com> wrote:
>
>> On 14 Dec, 2019, at 1:52 am, Thibaut <hacks@slashdirt.org> wrote:
>>
>> Culprit turned out to be easy to identify: it’s the current master HEAD.
>>
>> Reverting 183b320 fixed the issue.
>
> That's extremely odd. That commit should only affect traffic carrying the LE DSCP, which is not the default.
>
> Perhaps it was not actually the code change, but triggering a rebuild of the module?
No. I tried with and without multiple times: I built, installed, manually unloaded the module, made sure it was unloaded, loaded the new build; just to make sure as I noticed the module doesn’t print anything in dmesg when it’s loaded (feature request: print the current build version when loading, that would be most helpful in these circumstances).
There is absolutely no doubt that on my router, with this commit CAKE is broken, without it isn’t.
Here’s tc -s output with the broken version:
tc -s qdisc show dev wan
qdisc cake 800f: root refcnt 2 bandwidth 1200Kbit diffserv3 dual-srchost nat nowash ack-filter split-gso rtt 100.0ms atm overhead 48 no-sce
Sent 7711782 bytes 5454 pkt (dropped 144, overlimits 15493 requeues 0)
backlog 1616b 2p requeues 0
memory used: 140864b of 4Mb
capacity estimate: 1200Kbit
min/max network layer size: 40 / 1500
min/max overhead-adjusted size: 106 / 1749
average network hdr offset: 14
Bulk Best Effort Voice
thresh 75Kbit 1200Kbit 300Kbit
target 242.2ms 15.1ms 60.6ms
interval 484.5ms 110.1ms 155.6ms
pk_delay 0us 60.0ms 26.8ms
av_delay 0us 36.7ms 2.0ms
sp_delay 0us 17.8ms 1.7ms
backlog 0b 1514b 102b
pkts 0 5467 133
bytes 0 7913444 17970
way_inds 0 0 0
way_miss 0 44 2
way_cols 0 0 0
sce 0 0 0
marks 0 0 0
drops 0 144 0
ack_drop 0 0 0
sp_flows 0 0 1
bk_flows 0 1 0
un_flows 0 0 0
max_len 0 3028 1118
quantum 300 300 300
qdisc ingress ffff: parent ffff:fff1 ----------------
Sent 218759 bytes 3710 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
Here’s the same output with the unbroken version:
tc -s qdisc show dev wan
qdisc cake 8011: root refcnt 2 bandwidth 1200Kbit diffserv3 dual-srchost nat nowash ack-filter split-gso rtt 100.0ms atm overhead 48 no-sce
Sent 3342962 bytes 2328 pkt (dropped 110, overlimits 6422 requeues 0)
backlog 4542b 3p requeues 0
memory used: 83328b of 4Mb
capacity estimate: 1200Kbit
min/max network layer size: 40 / 1500
min/max overhead-adjusted size: 106 / 1749
average network hdr offset: 14
Bulk Best Effort Voice
thresh 75Kbit 1200Kbit 300Kbit
target 242.2ms 15.1ms 60.6ms
interval 484.5ms 110.1ms 155.6ms
pk_delay 0us 56.8ms 9.9ms
av_delay 0us 36.7ms 854us
sp_delay 0us 9.4ms 680us
backlog 0b 4542b 0b
pkts 0 2403 38
bytes 0 3509764 4280
way_inds 0 0 0
way_miss 0 17 1
way_cols 0 0 0
sce 0 0 0
marks 0 0 0
drops 0 110 0
ack_drop 0 0 0
sp_flows 0 0 1
bk_flows 0 1 0
un_flows 0 0 0
max_len 0 1514 294
quantum 300 300 300
qdisc ingress ffff: parent ffff:fff1 ----------------
Sent 106781 bytes 1896 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
HTH
Thibaut
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Cake] Trouble with CAKE
2019-12-14 10:01 ` Thibaut
@ 2019-12-14 10:35 ` Kevin 'ldir' Darbyshire-Bryant
2019-12-14 10:56 ` Kevin 'ldir' Darbyshire-Bryant
0 siblings, 1 reply; 20+ messages in thread
From: Kevin 'ldir' Darbyshire-Bryant @ 2019-12-14 10:35 UTC (permalink / raw)
To: Thibaut; +Cc: Jonathan Morton, Erik Taraldsen via Cake
[-- Attachment #1: Type: text/plain, Size: 4887 bytes --]
> On 14 Dec 2019, at 10:01, Thibaut <hacks@slashdirt.org> wrote:
>
>
>
>>
>> That's extremely odd. That commit should only affect traffic carrying the LE DSCP, which is not the default.
>>
>> Perhaps it was not actually the code change, but triggering a rebuild of the module?
>
> No. I tried with and without multiple times: I built, installed, manually unloaded the module, made sure it was unloaded, loaded the new build; just to make sure as I noticed the module doesn’t print anything in dmesg when it’s loaded (feature request: print the current build version when loading, that would be most helpful in these circumstances).
>
> There is absolutely no doubt that on my router, with this commit CAKE is broken, without it isn’t.
>
> Here’s tc -s output with the broken version:
>
> tc -s qdisc show dev wan
> qdisc cake 800f: root refcnt 2 bandwidth 1200Kbit diffserv3 dual-srchost nat nowash ack-filter split-gso rtt 100.0ms atm overhead 48 no-sce
> Sent 7711782 bytes 5454 pkt (dropped 144, overlimits 15493 requeues 0)
> backlog 1616b 2p requeues 0
> memory used: 140864b of 4Mb
> capacity estimate: 1200Kbit
> min/max network layer size: 40 / 1500
> min/max overhead-adjusted size: 106 / 1749
> average network hdr offset: 14
>
> Bulk Best Effort Voice
> thresh 75Kbit 1200Kbit 300Kbit
> target 242.2ms 15.1ms 60.6ms
> interval 484.5ms 110.1ms 155.6ms
> pk_delay 0us 60.0ms 26.8ms
> av_delay 0us 36.7ms 2.0ms
> sp_delay 0us 17.8ms 1.7ms
> backlog 0b 1514b 102b
> pkts 0 5467 133
> bytes 0 7913444 17970
> way_inds 0 0 0
> way_miss 0 44 2
> way_cols 0 0 0
> sce 0 0 0
> marks 0 0 0
> drops 0 144 0
> ack_drop 0 0 0
> sp_flows 0 0 1
> bk_flows 0 1 0
> un_flows 0 0 0
> max_len 0 3028 1118
> quantum 300 300 300
>
> qdisc ingress ffff: parent ffff:fff1 ----------------
> Sent 218759 bytes 3710 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
>
> Here’s the same output with the unbroken version:
>
> tc -s qdisc show dev wan
> qdisc cake 8011: root refcnt 2 bandwidth 1200Kbit diffserv3 dual-srchost nat nowash ack-filter split-gso rtt 100.0ms atm overhead 48 no-sce
> Sent 3342962 bytes 2328 pkt (dropped 110, overlimits 6422 requeues 0)
> backlog 4542b 3p requeues 0
> memory used: 83328b of 4Mb
> capacity estimate: 1200Kbit
> min/max network layer size: 40 / 1500
> min/max overhead-adjusted size: 106 / 1749
> average network hdr offset: 14
>
> Bulk Best Effort Voice
> thresh 75Kbit 1200Kbit 300Kbit
> target 242.2ms 15.1ms 60.6ms
> interval 484.5ms 110.1ms 155.6ms
> pk_delay 0us 56.8ms 9.9ms
> av_delay 0us 36.7ms 854us
> sp_delay 0us 9.4ms 680us
> backlog 0b 4542b 0b
> pkts 0 2403 38
> bytes 0 3509764 4280
> way_inds 0 0 0
> way_miss 0 17 1
> way_cols 0 0 0
> sce 0 0 0
> marks 0 0 0
> drops 0 110 0
> ack_drop 0 0 0
> sp_flows 0 0 1
> bk_flows 0 1 0
> un_flows 0 0 0
> max_len 0 1514 294
> quantum 300 300 300
>
> qdisc ingress ffff: parent ffff:fff1 ----------------
> Sent 106781 bytes 1896 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
>
>
> HTH
> Thibaut
Which shows most traffic going through Best Effort, whereas the LE DSCP would put it in Bulk, so at this point I’m failing to see the connection between that commit (which changes 3 lookup tables) and the behaviour change.
Can we see output from ’tc -s qdisc’ for the non-broken case please?
Cheers,
Kevin D-B
gpg: 012C ACB2 28C6 C53E 9775 9123 B3A2 389B 9DE2 334A
[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Cake] Trouble with CAKE
2019-12-14 10:35 ` Kevin 'ldir' Darbyshire-Bryant
@ 2019-12-14 10:56 ` Kevin 'ldir' Darbyshire-Bryant
2019-12-14 11:59 ` Thibaut
0 siblings, 1 reply; 20+ messages in thread
From: Kevin 'ldir' Darbyshire-Bryant @ 2019-12-14 10:56 UTC (permalink / raw)
To: Thibaut; +Cc: Jonathan Morton, Erik Taraldsen via Cake
[-- Attachment #1: Type: text/plain, Size: 5254 bytes --]
> On 14 Dec 2019, at 10:35, Kevin 'ldir' Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk> wrote:
>
>
>
>> On 14 Dec 2019, at 10:01, Thibaut <hacks@slashdirt.org> wrote:
>>
>>
>>
>>>
>>> That's extremely odd. That commit should only affect traffic carrying the LE DSCP, which is not the default.
>>>
>>> Perhaps it was not actually the code change, but triggering a rebuild of the module?
>>
>> No. I tried with and without multiple times: I built, installed, manually unloaded the module, made sure it was unloaded, loaded the new build; just to make sure as I noticed the module doesn’t print anything in dmesg when it’s loaded (feature request: print the current build version when loading, that would be most helpful in these circumstances).
>>
>> There is absolutely no doubt that on my router, with this commit CAKE is broken, without it isn’t.
>>
>> Here’s tc -s output with the broken version:
>>
>> tc -s qdisc show dev wan
>> qdisc cake 800f: root refcnt 2 bandwidth 1200Kbit diffserv3 dual-srchost nat nowash ack-filter split-gso rtt 100.0ms atm overhead 48 no-sce
>> Sent 7711782 bytes 5454 pkt (dropped 144, overlimits 15493 requeues 0)
>> backlog 1616b 2p requeues 0
>> memory used: 140864b of 4Mb
>> capacity estimate: 1200Kbit
>> min/max network layer size: 40 / 1500
>> min/max overhead-adjusted size: 106 / 1749
>> average network hdr offset: 14
>>
>> Bulk Best Effort Voice
>> thresh 75Kbit 1200Kbit 300Kbit
>> target 242.2ms 15.1ms 60.6ms
>> interval 484.5ms 110.1ms 155.6ms
>> pk_delay 0us 60.0ms 26.8ms
>> av_delay 0us 36.7ms 2.0ms
>> sp_delay 0us 17.8ms 1.7ms
>> backlog 0b 1514b 102b
>> pkts 0 5467 133
>> bytes 0 7913444 17970
>> way_inds 0 0 0
>> way_miss 0 44 2
>> way_cols 0 0 0
>> sce 0 0 0
>> marks 0 0 0
>> drops 0 144 0
>> ack_drop 0 0 0
>> sp_flows 0 0 1
>> bk_flows 0 1 0
>> un_flows 0 0 0
>> max_len 0 3028 1118
>> quantum 300 300 300
>>
>> qdisc ingress ffff: parent ffff:fff1 ----------------
>> Sent 218759 bytes 3710 pkt (dropped 0, overlimits 0 requeues 0)
>> backlog 0b 0p requeues 0
>>
>> Here’s the same output with the unbroken version:
>>
>> tc -s qdisc show dev wan
>> qdisc cake 8011: root refcnt 2 bandwidth 1200Kbit diffserv3 dual-srchost nat nowash ack-filter split-gso rtt 100.0ms atm overhead 48 no-sce
>> Sent 3342962 bytes 2328 pkt (dropped 110, overlimits 6422 requeues 0)
>> backlog 4542b 3p requeues 0
>> memory used: 83328b of 4Mb
>> capacity estimate: 1200Kbit
>> min/max network layer size: 40 / 1500
>> min/max overhead-adjusted size: 106 / 1749
>> average network hdr offset: 14
>>
>> Bulk Best Effort Voice
>> thresh 75Kbit 1200Kbit 300Kbit
>> target 242.2ms 15.1ms 60.6ms
>> interval 484.5ms 110.1ms 155.6ms
>> pk_delay 0us 56.8ms 9.9ms
>> av_delay 0us 36.7ms 854us
>> sp_delay 0us 9.4ms 680us
>> backlog 0b 4542b 0b
>> pkts 0 2403 38
>> bytes 0 3509764 4280
>> way_inds 0 0 0
>> way_miss 0 17 1
>> way_cols 0 0 0
>> sce 0 0 0
>> marks 0 0 0
>> drops 0 110 0
>> ack_drop 0 0 0
>> sp_flows 0 0 1
>> bk_flows 0 1 0
>> un_flows 0 0 0
>> max_len 0 1514 294
>> quantum 300 300 300
>>
>> qdisc ingress ffff: parent ffff:fff1 ----------------
>> Sent 106781 bytes 1896 pkt (dropped 0, overlimits 0 requeues 0)
>> backlog 0b 0p requeues 0
>>
>>
>> HTH
>> Thibaut
>
> Which shows most traffic going through Best Effort, whereas the LE DSCP would put it in Bulk, so at this point I’m failing to see the connection between that commit (which changes 3 lookup tables) and the behaviour change.
>
> Can we see output from ’tc -s qdisc’ for the non-broken case please?
Brain fart! The 2 different versions are there and we soe no difference in traffic/tin allocation. However, could we see the ifb4wan instances of cake for both b0rken and unb0rken cases please?
Cheers,
Kevin D-B
gpg: 012C ACB2 28C6 C53E 9775 9123 B3A2 389B 9DE2 334A
[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Cake] Trouble with CAKE
2019-12-14 10:56 ` Kevin 'ldir' Darbyshire-Bryant
@ 2019-12-14 11:59 ` Thibaut
2019-12-14 12:07 ` Thibaut
2019-12-14 12:09 ` Jonathan Morton
0 siblings, 2 replies; 20+ messages in thread
From: Thibaut @ 2019-12-14 11:59 UTC (permalink / raw)
To: Kevin 'ldir' Darbyshire-Bryant
Cc: Jonathan Morton, Erik Taraldsen via Cake
[-- Attachment #1: Type: text/plain, Size: 5786 bytes --]
> On 14 Dec 2019, at 11:56, Kevin 'ldir' Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk> wrote:
>
>
>
>> On 14 Dec 2019, at 10:35, Kevin 'ldir' Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk> wrote:
>>
>>
>>
>>> On 14 Dec 2019, at 10:01, Thibaut <hacks@slashdirt.org> wrote:
>>>
>>>
>>>
>>>>
>>>> That's extremely odd. That commit should only affect traffic carrying the LE DSCP, which is not the default.
>>>>
>>>> Perhaps it was not actually the code change, but triggering a rebuild of the module?
>>>
>>> No. I tried with and without multiple times: I built, installed, manually unloaded the module, made sure it was unloaded, loaded the new build; just to make sure as I noticed the module doesn’t print anything in dmesg when it’s loaded (feature request: print the current build version when loading, that would be most helpful in these circumstances).
>>>
>>> There is absolutely no doubt that on my router, with this commit CAKE is broken, without it isn’t.
>>>
>>> Here’s tc -s output with the broken version:
>>>
>>> tc -s qdisc show dev wan
>>> qdisc cake 800f: root refcnt 2 bandwidth 1200Kbit diffserv3 dual-srchost nat nowash ack-filter split-gso rtt 100.0ms atm overhead 48 no-sce
>>> Sent 7711782 bytes 5454 pkt (dropped 144, overlimits 15493 requeues 0)
>>> backlog 1616b 2p requeues 0
>>> memory used: 140864b of 4Mb
>>> capacity estimate: 1200Kbit
>>> min/max network layer size: 40 / 1500
>>> min/max overhead-adjusted size: 106 / 1749
>>> average network hdr offset: 14
>>>
>>> Bulk Best Effort Voice
>>> thresh 75Kbit 1200Kbit 300Kbit
>>> target 242.2ms 15.1ms 60.6ms
>>> interval 484.5ms 110.1ms 155.6ms
>>> pk_delay 0us 60.0ms 26.8ms
>>> av_delay 0us 36.7ms 2.0ms
>>> sp_delay 0us 17.8ms 1.7ms
>>> backlog 0b 1514b 102b
>>> pkts 0 5467 133
>>> bytes 0 7913444 17970
>>> way_inds 0 0 0
>>> way_miss 0 44 2
>>> way_cols 0 0 0
>>> sce 0 0 0
>>> marks 0 0 0
>>> drops 0 144 0
>>> ack_drop 0 0 0
>>> sp_flows 0 0 1
>>> bk_flows 0 1 0
>>> un_flows 0 0 0
>>> max_len 0 3028 1118
>>> quantum 300 300 300
>>>
>>> qdisc ingress ffff: parent ffff:fff1 ----------------
>>> Sent 218759 bytes 3710 pkt (dropped 0, overlimits 0 requeues 0)
>>> backlog 0b 0p requeues 0
>>>
>>> Here’s the same output with the unbroken version:
>>>
>>> tc -s qdisc show dev wan
>>> qdisc cake 8011: root refcnt 2 bandwidth 1200Kbit diffserv3 dual-srchost nat nowash ack-filter split-gso rtt 100.0ms atm overhead 48 no-sce
>>> Sent 3342962 bytes 2328 pkt (dropped 110, overlimits 6422 requeues 0)
>>> backlog 4542b 3p requeues 0
>>> memory used: 83328b of 4Mb
>>> capacity estimate: 1200Kbit
>>> min/max network layer size: 40 / 1500
>>> min/max overhead-adjusted size: 106 / 1749
>>> average network hdr offset: 14
>>>
>>> Bulk Best Effort Voice
>>> thresh 75Kbit 1200Kbit 300Kbit
>>> target 242.2ms 15.1ms 60.6ms
>>> interval 484.5ms 110.1ms 155.6ms
>>> pk_delay 0us 56.8ms 9.9ms
>>> av_delay 0us 36.7ms 854us
>>> sp_delay 0us 9.4ms 680us
>>> backlog 0b 4542b 0b
>>> pkts 0 2403 38
>>> bytes 0 3509764 4280
>>> way_inds 0 0 0
>>> way_miss 0 17 1
>>> way_cols 0 0 0
>>> sce 0 0 0
>>> marks 0 0 0
>>> drops 0 110 0
>>> ack_drop 0 0 0
>>> sp_flows 0 0 1
>>> bk_flows 0 1 0
>>> un_flows 0 0 0
>>> max_len 0 1514 294
>>> quantum 300 300 300
>>>
>>> qdisc ingress ffff: parent ffff:fff1 ----------------
>>> Sent 106781 bytes 1896 pkt (dropped 0, overlimits 0 requeues 0)
>>> backlog 0b 0p requeues 0
>>>
>>>
>>> HTH
>>> Thibaut
>>
>> Which shows most traffic going through Best Effort, whereas the LE DSCP would put it in Bulk, so at this point I’m failing to see the connection between that commit (which changes 3 lookup tables) and the behaviour change.
>>
>> Can we see output from ’tc -s qdisc’ for the non-broken case please?
>
> Brain fart! The 2 different versions are there and we soe no difference in traffic/tin allocation. However, could we see the ifb4wan instances of cake for both b0rken and unb0rken cases please?
The plot thickens. I was eventually able to reproduce the same buggy behavior without the HEAD commit, *sigh*
It appears that the bug happens randomly between consecutive module loads/unloads. It also appears that once the module is loaded in a “working state” it keeps working fine.
I’m wondering if this could be an “use of uninitialized data” type of bug.
Still digging.
Thibaut
[-- Attachment #2: Type: text/html, Size: 16005 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Cake] Trouble with CAKE
2019-12-14 11:59 ` Thibaut
@ 2019-12-14 12:07 ` Thibaut
2019-12-14 12:09 ` Jonathan Morton
1 sibling, 0 replies; 20+ messages in thread
From: Thibaut @ 2019-12-14 12:07 UTC (permalink / raw)
To: Kevin 'ldir' Darbyshire-Bryant; +Cc: Erik Taraldsen via Cake
[-- Attachment #1: Type: text/plain, Size: 906 bytes --]
> On 14 Dec 2019, at 12:59, Thibaut <hacks@slashdirt.org> wrote:
>
>> On 14 Dec 2019, at 11:56, Kevin 'ldir' Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk <mailto:ldir@darbyshire-bryant.me.uk>> wrote:
>>
>>
>> Brain fart! The 2 different versions are there and we soe no difference in traffic/tin allocation. However, could we see the ifb4wan instances of cake for both b0rken and unb0rken cases please?
>
> The plot thickens. I was eventually able to reproduce the same buggy behavior without the HEAD commit, *sigh*
>
> It appears that the bug happens randomly between consecutive module loads/unloads. It also appears that once the module is loaded in a “working state” it keeps working fine.
>
> I’m wondering if this could be an “use of uninitialized data” type of bug.
If that makes any difference, I’m using tc-adv 2f0d76d8 (i.e. current master HEAD)
Thibaut
[-- Attachment #2: Type: text/html, Size: 4607 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Cake] Trouble with CAKE
2019-12-14 11:59 ` Thibaut
2019-12-14 12:07 ` Thibaut
@ 2019-12-14 12:09 ` Jonathan Morton
2019-12-14 12:11 ` Thibaut
1 sibling, 1 reply; 20+ messages in thread
From: Jonathan Morton @ 2019-12-14 12:09 UTC (permalink / raw)
To: Thibaut; +Cc: Kevin 'ldir' Darbyshire-Bryant, Erik Taraldsen via Cake
> On 14 Dec, 2019, at 1:59 pm, Thibaut <hacks@slashdirt.org> wrote:
>
> I’m wondering if this could be an “use of uninitialized data” type of bug.
This is why I wouldn't keep working on an old kernel that's full of vendor patches.
- Jonathan Morton
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Cake] Trouble with CAKE
2019-12-14 12:09 ` Jonathan Morton
@ 2019-12-14 12:11 ` Thibaut
2019-12-14 12:59 ` Toke Høiland-Jørgensen
0 siblings, 1 reply; 20+ messages in thread
From: Thibaut @ 2019-12-14 12:11 UTC (permalink / raw)
To: Jonathan Morton
Cc: Kevin 'ldir' Darbyshire-Bryant, Erik Taraldsen via Cake
> On 14 Dec 2019, at 13:09, Jonathan Morton <chromatix99@gmail.com> wrote:
>
>> On 14 Dec, 2019, at 1:59 pm, Thibaut <hacks@slashdirt.org> wrote:
>>
>> I’m wondering if this could be an “use of uninitialized data” type of bug.
>
> This is why I wouldn't keep working on an old kernel that's full of vendor patches.
Forgive me for trying to use cake on a supported stable distro.
All distros are full of vendor patches (OpenWRT is no exception). The subset of linux machines that use vanilla is ‘below measurable threshold’…
Cheers,
Thibaut
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Cake] Trouble with CAKE
2019-12-14 12:11 ` Thibaut
@ 2019-12-14 12:59 ` Toke Høiland-Jørgensen
2019-12-14 14:04 ` Thibaut
0 siblings, 1 reply; 20+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-12-14 12:59 UTC (permalink / raw)
To: Thibaut, Jonathan Morton
Cc: Erik Taraldsen via Cake, Kevin 'ldir' Darbyshire-Bryant
Thibaut <hacks@slashdirt.org> writes:
>> On 14 Dec 2019, at 13:09, Jonathan Morton <chromatix99@gmail.com> wrote:
>>
>>> On 14 Dec, 2019, at 1:59 pm, Thibaut <hacks@slashdirt.org> wrote:
>>>
>>> I’m wondering if this could be an “use of uninitialized data” type of bug.
>>
>> This is why I wouldn't keep working on an old kernel that's full of vendor patches.
>
> Forgive me for trying to use cake on a supported stable distro.
>
> All distros are full of vendor patches (OpenWRT is no exception). The
> subset of linux machines that use vanilla is ‘below measurable
> threshold’...
The Linux kernel development moves at a fairly rapid pace, and sadly
it's not practical to have fully supported backwards compatibility in a
community effort such as CAKE.
Now, this doesn't mean that we won't take patches to fix things for old
kernels; or even help with debugging on old versions, as you've already
seen in this thread. But the reality is unfortunately that the bulk of
this effort is going to have to be on the users running on those
kernels. I.e., you in this case. Such is open source: everyone scratches
their own itch and the end result is something that (mostly) works for
everyone :)
-Toke
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Cake] Trouble with CAKE
2019-12-14 12:59 ` Toke Høiland-Jørgensen
@ 2019-12-14 14:04 ` Thibaut
2019-12-14 21:35 ` Toke Høiland-Jørgensen
0 siblings, 1 reply; 20+ messages in thread
From: Thibaut @ 2019-12-14 14:04 UTC (permalink / raw)
To: Toke Høiland-Jørgensen
Cc: Jonathan Morton, Erik Taraldsen via Cake,
Kevin 'ldir' Darbyshire-Bryant
Hi Toke,
> On 14 Dec 2019, at 13:59, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Thibaut <hacks@slashdirt.org> writes:
>
>>> On 14 Dec 2019, at 13:09, Jonathan Morton <chromatix99@gmail.com> wrote:
>>>
>>>> On 14 Dec, 2019, at 1:59 pm, Thibaut <hacks@slashdirt.org> wrote:
>>>>
>>>> I’m wondering if this could be an “use of uninitialized data” type of bug.
>>>
>>> This is why I wouldn't keep working on an old kernel that's full of vendor patches.
>>
>> Forgive me for trying to use cake on a supported stable distro.
>>
>> All distros are full of vendor patches (OpenWRT is no exception). The
>> subset of linux machines that use vanilla is ‘below measurable
>> threshold’...
>
> The Linux kernel development moves at a fairly rapid pace, and sadly
> it's not practical to have fully supported backwards compatibility in a
> community effort such as CAKE.
>
> Now, this doesn't mean that we won't take patches to fix things for old
> kernels; or even help with debugging on old versions, as you've already
> seen in this thread. But the reality is unfortunately that the bulk of
> this effort is going to have to be on the users running on those
> kernels. I.e., you in this case. Such is open source: everyone scratches
> their own itch and the end result is something that (mostly) works for
> everyone :)
I understand that, I’m familiar with the kernel development philosophy (I used to be a contributor in a previous life).
I’m also familiar with the fact that most kernel hackers tend to assume that the people who use their code and report bugs will know said code like the back of their hand and will be capable to spot where to look for the cause of the behavior they’re seing and provide a patch without further ado.
I hope you can see why this cannot be the case especially with something as delicate and complex as a traffic shaper :)
That’s why I’m happy to debug as much as possible and possibly try to cook a patch if needed, but without a bit of help/feedback (and thus interest) from the authors, this is a lost cause.
Meanwhile, I can add that not all traffic crawls to a grinding halt: speedtests and fluctuating traffic (such as, in the case of the buildbots, the upstreaming of the build stdio) appear to be mostly unaffected (I see sustained traffic at line speed every now and then, especially during very verbose build output).
But for some reason, when the rsync of the build results begins, cake appears adamant (at least when it exposes the offending behavior) that it must be killed with extreme prejudice ;P
Would that ring any bell?
Cheers,
Thibaut
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Cake] Trouble with CAKE
2019-12-14 14:04 ` Thibaut
@ 2019-12-14 21:35 ` Toke Høiland-Jørgensen
0 siblings, 0 replies; 20+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-12-14 21:35 UTC (permalink / raw)
To: Thibaut
Cc: Jonathan Morton, Erik Taraldsen via Cake,
Kevin 'ldir' Darbyshire-Bryant
Thibaut <hacks@slashdirt.org> writes:
> Hi Toke,
>
>> On 14 Dec 2019, at 13:59, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Thibaut <hacks@slashdirt.org> writes:
>>
>>>> On 14 Dec 2019, at 13:09, Jonathan Morton <chromatix99@gmail.com> wrote:
>>>>
>>>>> On 14 Dec, 2019, at 1:59 pm, Thibaut <hacks@slashdirt.org> wrote:
>>>>>
>>>>> I’m wondering if this could be an “use of uninitialized data” type of bug.
>>>>
>>>> This is why I wouldn't keep working on an old kernel that's full of vendor patches.
>>>
>>> Forgive me for trying to use cake on a supported stable distro.
>>>
>>> All distros are full of vendor patches (OpenWRT is no exception). The
>>> subset of linux machines that use vanilla is ‘below measurable
>>> threshold’...
>>
>> The Linux kernel development moves at a fairly rapid pace, and sadly
>> it's not practical to have fully supported backwards compatibility in a
>> community effort such as CAKE.
>>
>> Now, this doesn't mean that we won't take patches to fix things for old
>> kernels; or even help with debugging on old versions, as you've already
>> seen in this thread. But the reality is unfortunately that the bulk of
>> this effort is going to have to be on the users running on those
>> kernels. I.e., you in this case. Such is open source: everyone scratches
>> their own itch and the end result is something that (mostly) works for
>> everyone :)
>
> I understand that, I’m familiar with the kernel development philosophy
> (I used to be a contributor in a previous life).
>
> I’m also familiar with the fact that most kernel hackers tend to
> assume that the people who use their code and report bugs will know
> said code like the back of their hand and will be capable to spot
> where to look for the cause of the behavior they’re seing and provide
> a patch without further ado.
>
> I hope you can see why this cannot be the case especially with
> something as delicate and complex as a traffic shaper :)
>
> That’s why I’m happy to debug as much as possible and possibly try to
> cook a patch if needed, but without a bit of help/feedback (and thus
> interest) from the authors, this is a lost cause.
>
> Meanwhile, I can add that not all traffic crawls to a grinding halt:
> speedtests and fluctuating traffic (such as, in the case of the
> buildbots, the upstreaming of the build stdio) appear to be mostly
> unaffected (I see sustained traffic at line speed every now and then,
> especially during very verbose build output).
>
> But for some reason, when the rsync of the build results begins, cake
> appears adamant (at least when it exposes the offending behavior) that
> it must be killed with extreme prejudice ;P
>
> Would that ring any bell?
Not really. A first step towards making progress with this could be a
packet dump of a TCP stream that is affected by the slowdown, vs one
that isn't. Preferably with before/after stats output from CAKE from
each of them. That way, hopefully it'll be possible to figure out *what*
is happening to make things crawl, which could ease the such for the why
of it afterwards :)
-Toke
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2019-12-14 21:35 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-13 13:43 [Cake] Trouble with CAKE Thibaut
2019-12-13 14:02 ` Jonathan Morton
2019-12-13 22:39 ` Thibaut
2019-12-13 22:40 ` Thibaut
2019-12-13 23:52 ` Thibaut
2019-12-14 9:50 ` Jonathan Morton
2019-12-14 10:01 ` Thibaut
2019-12-14 10:35 ` Kevin 'ldir' Darbyshire-Bryant
2019-12-14 10:56 ` Kevin 'ldir' Darbyshire-Bryant
2019-12-14 11:59 ` Thibaut
2019-12-14 12:07 ` Thibaut
2019-12-14 12:09 ` Jonathan Morton
2019-12-14 12:11 ` Thibaut
2019-12-14 12:59 ` Toke Høiland-Jørgensen
2019-12-14 14:04 ` Thibaut
2019-12-14 21:35 ` Toke Høiland-Jørgensen
2019-12-13 14:13 ` Thibaut
2019-12-13 14:15 ` Sebastian Moeller
2019-12-13 14:21 ` Thibaut
2019-12-13 18:44 ` Thibaut
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox