On 14 Dec 2019, at 10:35, Kevin 'ldir' Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk> wrote:
On 14 Dec 2019, at 10:01, Thibaut <hacks@slashdirt.org> wrote:
That's extremely odd. That commit should only affect traffic carrying the LE DSCP, which is not the default.
Perhaps it was not actually the code change, but triggering a rebuild of the module?
No. I tried with and without multiple times: I built, installed, manually unloaded the module, made sure it was unloaded, loaded the new build; just to make sure as I noticed the module doesn’t print anything in dmesg when it’s loaded (feature request: print the current build version when loading, that would be most helpful in these circumstances).
There is absolutely no doubt that on my router, with this commit CAKE is broken, without it isn’t.
Here’s tc -s output with the broken version:
tc -s qdisc show dev wan
qdisc cake 800f: root refcnt 2 bandwidth 1200Kbit diffserv3 dual-srchost nat nowash ack-filter split-gso rtt 100.0ms atm overhead 48 no-sce
Sent 7711782 bytes 5454 pkt (dropped 144, overlimits 15493 requeues 0)
backlog 1616b 2p requeues 0
memory used: 140864b of 4Mb
capacity estimate: 1200Kbit
min/max network layer size: 40 / 1500
min/max overhead-adjusted size: 106 / 1749
average network hdr offset: 14
Bulk Best Effort Voice
thresh 75Kbit 1200Kbit 300Kbit
target 242.2ms 15.1ms 60.6ms
interval 484.5ms 110.1ms 155.6ms
pk_delay 0us 60.0ms 26.8ms
av_delay 0us 36.7ms 2.0ms
sp_delay 0us 17.8ms 1.7ms
backlog 0b 1514b 102b
pkts 0 5467 133
bytes 0 7913444 17970
way_inds 0 0 0
way_miss 0 44 2
way_cols 0 0 0
sce 0 0 0
marks 0 0 0
drops 0 144 0
ack_drop 0 0 0
sp_flows 0 0 1
bk_flows 0 1 0
un_flows 0 0 0
max_len 0 3028 1118
quantum 300 300 300
qdisc ingress ffff: parent ffff:fff1 ----------------
Sent 218759 bytes 3710 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
Here’s the same output with the unbroken version:
tc -s qdisc show dev wan
qdisc cake 8011: root refcnt 2 bandwidth 1200Kbit diffserv3 dual-srchost nat nowash ack-filter split-gso rtt 100.0ms atm overhead 48 no-sce
Sent 3342962 bytes 2328 pkt (dropped 110, overlimits 6422 requeues 0)
backlog 4542b 3p requeues 0
memory used: 83328b of 4Mb
capacity estimate: 1200Kbit
min/max network layer size: 40 / 1500
min/max overhead-adjusted size: 106 / 1749
average network hdr offset: 14
Bulk Best Effort Voice
thresh 75Kbit 1200Kbit 300Kbit
target 242.2ms 15.1ms 60.6ms
interval 484.5ms 110.1ms 155.6ms
pk_delay 0us 56.8ms 9.9ms
av_delay 0us 36.7ms 854us
sp_delay 0us 9.4ms 680us
backlog 0b 4542b 0b
pkts 0 2403 38
bytes 0 3509764 4280
way_inds 0 0 0
way_miss 0 17 1
way_cols 0 0 0
sce 0 0 0
marks 0 0 0
drops 0 110 0
ack_drop 0 0 0
sp_flows 0 0 1
bk_flows 0 1 0
un_flows 0 0 0
max_len 0 1514 294
quantum 300 300 300
qdisc ingress ffff: parent ffff:fff1 ----------------
Sent 106781 bytes 1896 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
HTH
Thibaut
Which shows most traffic going through Best Effort, whereas the LE DSCP would put it in Bulk, so at this point I’m failing to see the connection between that commit (which changes 3 lookup tables) and the behaviour change.
Can we see output from ’tc -s qdisc’ for the non-broken case please?
Brain fart! The 2 different versions are there and we soe no difference in traffic/tin allocation. However, could we see the ifb4wan instances of cake for both b0rken and unb0rken cases please?
The plot thickens. I was eventually able to reproduce the same buggy behavior without the HEAD commit, *sigh*
It appears that the bug happens randomly between consecutive module loads/unloads. It also appears that once the module is loaded in a “working state” it keeps working fine.
I’m wondering if this could be an “use of uninitialized data” type of bug.
Still digging.
Thibaut