From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from vps.slashdirt.org (vps.slashdirt.org [144.91.108.218]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id BC3DC3B2A4 for ; Sat, 14 Dec 2019 06:59:13 -0500 (EST) Received: from tama.lan (unknown [37.165.135.163]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by vps.slashdirt.org (Postfix) with ESMTPSA id 35ACA603EA; Sat, 14 Dec 2019 12:59:12 +0100 (CET) DKIM-Filter: OpenDKIM Filter v2.11.0 vps.slashdirt.org 35ACA603EA DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=slashdirt.org; s=mail; t=1576324752; bh=a6CWJEjKnlP3WwyvNFouoaQMuN+JH1HZgcNaU6Vgc9U=; h=From:Subject:Date:In-Reply-To:Cc:To:References:From; b=UekqmQ+FgOYP0RZyqLGxWUbQBQrka7V8L1SgXwpgkuPcDOe2siq/6k3quSsBHzh7M 9XvbyXJdrTTb6lMNdZ2LovDM+KJges3IWwP+dBbv5uten4FO+onLngx37sxc92s5rm QRcx1zYu/k4bHTmUDNYibHqx0kGEaQCHvA7+neY0= From: Thibaut Message-Id: <55F31738-C935-4361-B11E-9E7CA657489F@slashdirt.org> Content-Type: multipart/alternative; boundary="Apple-Mail=_9D631D92-7245-4EEA-A30E-EC88CF917F0B" Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\)) Date: Sat, 14 Dec 2019 12:59:11 +0100 In-Reply-To: <4E238145-37A2-45AC-B8BB-AD602C4D1044@darbyshire-bryant.me.uk> Cc: Jonathan Morton , Erik Taraldsen via Cake To: Kevin 'ldir' Darbyshire-Bryant References: <1d359153abfc413b4f61c647437427d6@slashdirt.org> <8FC615C8-4885-4F0A-B2EE-934DC4E806E8@gmail.com> <6E9587F7-C208-4568-8429-AEBA9E694E24@slashdirt.org> <46DDBCAA-EC6C-492F-8448-DF85ABB4E1DB@slashdirt.org> <1507FAF0-8A13-48E1-8A36-0D352F4FDD00@gmail.com> <4E238145-37A2-45AC-B8BB-AD602C4D1044@darbyshire-bryant.me.uk> X-Mailer: Apple Mail (2.3445.104.11) Subject: Re: [Cake] Trouble with CAKE X-BeenThere: cake@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Cake - FQ_codel the next generation List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Dec 2019 11:59:14 -0000 --Apple-Mail=_9D631D92-7245-4EEA-A30E-EC88CF917F0B Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On 14 Dec 2019, at 11:56, Kevin 'ldir' Darbyshire-Bryant = wrote: >=20 >=20 >=20 >> On 14 Dec 2019, at 10:35, Kevin 'ldir' Darbyshire-Bryant = wrote: >>=20 >>=20 >>=20 >>> On 14 Dec 2019, at 10:01, Thibaut wrote: >>>=20 >>>=20 >>>=20 >>>>=20 >>>> That's extremely odd. That commit should only affect traffic = carrying the LE DSCP, which is not the default. >>>>=20 >>>> Perhaps it was not actually the code change, but triggering a = rebuild of the module? >>>=20 >>> No. I tried with and without multiple times: I built, installed, = manually unloaded the module, made sure it was unloaded, loaded the new = build; just to make sure as I noticed the module doesn=E2=80=99t print = anything in dmesg when it=E2=80=99s loaded (feature request: print the = current build version when loading, that would be most helpful in these = circumstances). >>>=20 >>> There is absolutely no doubt that on my router, with this commit = CAKE is broken, without it isn=E2=80=99t. >>>=20 >>> Here=E2=80=99s tc -s output with the broken version: >>>=20 >>> tc -s qdisc show dev wan >>> qdisc cake 800f: root refcnt 2 bandwidth 1200Kbit diffserv3 = dual-srchost nat nowash ack-filter split-gso rtt 100.0ms atm overhead 48 = no-sce >>> Sent 7711782 bytes 5454 pkt (dropped 144, overlimits 15493 requeues = 0) >>> backlog 1616b 2p requeues 0 >>> memory used: 140864b of 4Mb >>> capacity estimate: 1200Kbit >>> min/max network layer size: 40 / 1500 >>> min/max overhead-adjusted size: 106 / 1749 >>> average network hdr offset: 14 >>>=20 >>> Bulk Best Effort Voice >>> thresh 75Kbit 1200Kbit 300Kbit >>> target 242.2ms 15.1ms 60.6ms >>> interval 484.5ms 110.1ms 155.6ms >>> pk_delay 0us 60.0ms 26.8ms >>> av_delay 0us 36.7ms 2.0ms >>> sp_delay 0us 17.8ms 1.7ms >>> backlog 0b 1514b 102b >>> pkts 0 5467 133 >>> bytes 0 7913444 17970 >>> way_inds 0 0 0 >>> way_miss 0 44 2 >>> way_cols 0 0 0 >>> sce 0 0 0 >>> marks 0 0 0 >>> drops 0 144 0 >>> ack_drop 0 0 0 >>> sp_flows 0 0 1 >>> bk_flows 0 1 0 >>> un_flows 0 0 0 >>> max_len 0 3028 1118 >>> quantum 300 300 300 >>>=20 >>> qdisc ingress ffff: parent ffff:fff1 ---------------- >>> Sent 218759 bytes 3710 pkt (dropped 0, overlimits 0 requeues 0) >>> backlog 0b 0p requeues 0 >>>=20 >>> Here=E2=80=99s the same output with the unbroken version: >>>=20 >>> tc -s qdisc show dev wan >>> qdisc cake 8011: root refcnt 2 bandwidth 1200Kbit diffserv3 = dual-srchost nat nowash ack-filter split-gso rtt 100.0ms atm overhead 48 = no-sce >>> Sent 3342962 bytes 2328 pkt (dropped 110, overlimits 6422 requeues = 0) >>> backlog 4542b 3p requeues 0 >>> memory used: 83328b of 4Mb >>> capacity estimate: 1200Kbit >>> min/max network layer size: 40 / 1500 >>> min/max overhead-adjusted size: 106 / 1749 >>> average network hdr offset: 14 >>>=20 >>> Bulk Best Effort Voice >>> thresh 75Kbit 1200Kbit 300Kbit >>> target 242.2ms 15.1ms 60.6ms >>> interval 484.5ms 110.1ms 155.6ms >>> pk_delay 0us 56.8ms 9.9ms >>> av_delay 0us 36.7ms 854us >>> sp_delay 0us 9.4ms 680us >>> backlog 0b 4542b 0b >>> pkts 0 2403 38 >>> bytes 0 3509764 4280 >>> way_inds 0 0 0 >>> way_miss 0 17 1 >>> way_cols 0 0 0 >>> sce 0 0 0 >>> marks 0 0 0 >>> drops 0 110 0 >>> ack_drop 0 0 0 >>> sp_flows 0 0 1 >>> bk_flows 0 1 0 >>> un_flows 0 0 0 >>> max_len 0 1514 294 >>> quantum 300 300 300 >>>=20 >>> qdisc ingress ffff: parent ffff:fff1 ---------------- >>> Sent 106781 bytes 1896 pkt (dropped 0, overlimits 0 requeues 0) >>> backlog 0b 0p requeues 0 >>>=20 >>>=20 >>> HTH >>> Thibaut >>=20 >> Which shows most traffic going through Best Effort, whereas the LE = DSCP would put it in Bulk, so at this point I=E2=80=99m failing to see = the connection between that commit (which changes 3 lookup tables) and = the behaviour change. >>=20 >> Can we see output from =E2=80=99tc -s qdisc=E2=80=99 for the = non-broken case please? >=20 > Brain fart! The 2 different versions are there and we soe no = difference in traffic/tin allocation. However, could we see the ifb4wan = instances of cake for both b0rken and unb0rken cases please? The plot thickens. I was eventually able to reproduce the same buggy = behavior without the HEAD commit, *sigh* It appears that the bug happens randomly between consecutive module = loads/unloads. It also appears that once the module is loaded in a = =E2=80=9Cworking state=E2=80=9D it keeps working fine. I=E2=80=99m wondering if this could be an =E2=80=9Cuse of uninitialized = data=E2=80=9D type of bug. Still digging. Thibaut= --Apple-Mail=_9D631D92-7245-4EEA-A30E-EC88CF917F0B Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8

On 14 Dec 2019, at 11:56, Kevin 'ldir' Darbyshire-Bryant = <ldir@darbyshire-bryant.me.uk> wrote:



On 14 = Dec 2019, at 10:35, Kevin 'ldir' Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk> wrote:



On 14 Dec 2019, at 10:01, Thibaut <hacks@slashdirt.org>= wrote:




That's = extremely odd.  That commit should only affect traffic carrying the = LE DSCP, which is not the default.

Perhaps = it was not actually the code change, but triggering a rebuild of the = module?

No. I tried with and = without multiple times: I built, installed, manually unloaded the = module, made sure it was unloaded, loaded the new build; just to make = sure as I noticed the module doesn=E2=80=99t print anything in dmesg = when it=E2=80=99s loaded (feature request: print the current build = version when loading, that would be most helpful in these = circumstances).

There is absolutely no = doubt that on my router, with this commit CAKE is broken, without it = isn=E2=80=99t.

Here=E2=80=99s tc -s output = with the broken version:

tc -s qdisc show = dev wan
qdisc cake 800f: root refcnt 2 bandwidth 1200Kbit = diffserv3 dual-srchost nat nowash ack-filter split-gso rtt 100.0ms atm = overhead 48 no-sce
Sent 7711782 bytes 5454 pkt (dropped = 144, overlimits 15493 requeues 0)
backlog 1616b 2p = requeues 0
memory used: 140864b of 4Mb
capacity estimate: 1200Kbit
min/max network = layer size: =           40 / =    1500
min/max overhead-adjusted size: =      106 /    1749
average network hdr offset: =           14

          &nb= sp;     Bulk  Best Effort =        Voice
thresh =         75Kbit =     1200Kbit =      300Kbit
target =        242.2ms =       15.1ms =       60.6ms
interval =      484.5ms =      110.1ms =      155.6ms
pk_delay =          0us =       60.0ms =       26.8ms
av_delay =          0us =       36.7ms =        2.0ms
sp_delay =          0us =       17.8ms =        1.7ms
backlog =            0b =        1514b =         102b
pkts =             &n= bsp;  0         5467 =          133
bytes =             &n= bsp; 0      7913444 =        17970
way_inds =            0 =            0 =            0
way_miss =            0 =           44 =            2
way_cols =            0 =            0 =            0
sce =             &n= bsp;   0 =            0 =            0
marks =             &n= bsp; 0 =            0 =            0
drops =             &n= bsp; 0          144 =            0
ack_drop =            0 =            0 =            0
sp_flows =            0 =            0 =            1
bk_flows =            0 =            1 =            0
un_flows =            0 =            0 =            0
max_len =             0 =         3028 =         1118
quantum=           300 =          300 =          300

qdisc ingress ffff: parent ffff:fff1 = ----------------
Sent 218759 bytes 3710 pkt (dropped 0, = overlimits 0 requeues 0)
backlog 0b 0p requeues 0

Here=E2=80=99s the same output with the = unbroken version:

tc -s qdisc show dev = wan
qdisc cake 8011: root refcnt 2 bandwidth 1200Kbit = diffserv3 dual-srchost nat nowash ack-filter split-gso rtt 100.0ms atm = overhead 48 no-sce
Sent 3342962 bytes 2328 pkt (dropped = 110, overlimits 6422 requeues 0)
backlog 4542b 3p requeues = 0
memory used: 83328b of 4Mb
capacity = estimate: 1200Kbit
min/max network layer size: =           40 / =    1500
min/max overhead-adjusted size: =      106 /    1749
average network hdr offset: =           14

          &nb= sp;     Bulk  Best Effort =        Voice
thresh =         75Kbit =     1200Kbit =      300Kbit
target =        242.2ms =       15.1ms =       60.6ms
interval =      484.5ms =      110.1ms =      155.6ms
pk_delay =          0us =       56.8ms =        9.9ms
av_delay =          0us =       36.7ms =        854us
sp_delay =          0us =        9.4ms =        680us
backlog =            0b =        4542b =           0b
pkts =             &n= bsp;  0         2403 =           38
bytes =             &n= bsp; 0      3509764 =         4280
way_inds =            0 =            0 =            0
way_miss =            0 =           17 =            1
way_cols =            0 =            0 =            0
sce =             &n= bsp;   0 =            0 =            0
marks =             &n= bsp; 0 =            0 =            0
drops =             &n= bsp; 0          110 =            0
ack_drop =            0 =            0 =            0
sp_flows =            0 =            0 =            1
bk_flows =            0 =            1 =            0
un_flows =            0 =            0 =            0
max_len =             0 =         1514 =          294
quantum =           300 =          300 =          300

qdisc ingress ffff: parent ffff:fff1 = ----------------
Sent 106781 bytes 1896 pkt (dropped 0, = overlimits 0 requeues 0)
backlog 0b 0p requeues 0


HTH
Thibaut

Which shows most traffic going = through Best Effort, whereas the LE DSCP would put it in Bulk, so at = this point I=E2=80=99m failing to see the connection between that commit = (which changes 3 lookup tables) and the behaviour change.

Can we see output from =E2=80=99tc -s qdisc=E2=80= =99 for the non-broken case please?

Brain fart! =  The 2 different versions are there and we soe no difference in = traffic/tin allocation.  However, could we see the ifb4wan = instances of cake for both b0rken and unb0rken cases please?

The plot thickens. I was eventually able to reproduce the = same buggy behavior without the HEAD commit, *sigh*

It appears that the bug = happens randomly between consecutive module loads/unloads. It also = appears that once the module is loaded in a =E2=80=9Cworking state=E2=80=9D= it keeps working fine.

I=E2=80=99m wondering if this could be an =E2=80=9Cuse of = uninitialized data=E2=80=9D type of bug.

Still digging.
Thibaut
= --Apple-Mail=_9D631D92-7245-4EEA-A30E-EC88CF917F0B--