From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp87.ord1c.emailsrvr.com (smtp87.ord1c.emailsrvr.com [108.166.43.87]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id 627453B2A4 for ; Tue, 1 Sep 2020 15:31:17 -0400 (EDT) X-Auth-ID: jf@jonathanfoulkes.com Received: by smtp3.relay.ord1c.emailsrvr.com (Authenticated sender: jf-AT-jonathanfoulkes.com) with ESMTPSA id 697B3A0244; Tue, 1 Sep 2020 15:31:16 -0400 (EDT) From: Jonathan Foulkes Message-Id: <1170B9AF-977E-45F4-ABCB-78A801F609C9@jonathanfoulkes.com> Content-Type: multipart/alternative; boundary="Apple-Mail=_982A45C8-5EC4-4641-9D40-745E513310CB" Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.1\)) Date: Tue, 1 Sep 2020 15:31:15 -0400 In-Reply-To: <4E67F6C6-5187-4052-B7CA-177245F01CC0@gmx.de> Cc: =?utf-8?Q?Toke_H=C3=B8iland-J=C3=B8rgensen?= , bloat@lists.bufferbloat.net To: Sebastian Moeller References: <87mu2bjbf8.fsf@toke.dk> <5DBFB383-13E8-4587-BE49-1767471D7D59@jonathanfoulkes.com> <4E67F6C6-5187-4052-B7CA-177245F01CC0@gmx.de> X-Mailer: Apple Mail (2.3608.120.23.2.1) X-Classification-ID: 533ff8c5-b466-41b0-aa53-d6f68562c5ea-1-1 Subject: Re: [Bloat] CAKE in openwrt high CPU X-BeenThere: bloat@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: General list for discussing Bufferbloat List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Sep 2020 19:31:17 -0000 --Apple-Mail=_982A45C8-5EC4-4641-9D40-745E513310CB Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hi Sebastian, Cake functions wonderfully, it=E2=80=99s a marvel in terms = of goodput. My comment was more oriented at the metrics process users use to = evaluate results. Only those who spend time analyzing just how busy an = =E2=80=98idle=E2=80=99 network can be know that there are a lot of = processes in constant communications with their cloud services.=20 The challenge are the end users, who only understand the silly = =E2=80=99speed=E2=80=99 metric, and feel anything that lowers that = number is a =E2=80=98bad=E2=80=99 thing. It takes effort to get even = technical users to get it. But even beyond the basic, the further cuts induced by fairness is the = new wrinkle in dealing with widely varying speed test results with = isolation enabled on a busy network. The high density of devices and constant chatter with cloud services = means the average home has way more devices and connections than many = realize. Keep a note of the number of =E2=80=98active connections=E2=80=99= displayed on the OpenWRT overview page, you might be surprised (well, = not you Seb ;) ) As an example, on my network, I average 1,000 active connections all = day, it rarely drops below 700. And it=E2=80=99s just two WFH = professionals and 60+ network devices, not all of which are active at = any one time. I actually run some custom firewall rules to de-prioritize four IoT = devices that generate a LOT of traffic to their services. Two of which = power panel monitors with real-time updates. This is why my bulk tin on = egress has such high traffic. Since you like to see tc output, here=E2=80=99s the one from my system = after nearly a week. I run four-layer Cake as we do a lot of Zoom calls and our accounts are = set up to do the appropriate DSCP marking. root@IQrouter:~# tc -s qdisc qdisc noqueue 0: dev lo root refcnt 2=20 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)=20 backlog 0b 0p requeues 0 qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum = 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn=20 Sent 51311363856 bytes 86785488 pkt (dropped 53, overlimits 0 requeues = 9114)=20 backlog 0b 0p requeues 9114 maxpacket 12112 drop_overlimit 0 new_flow_count 691740 ecn_mark 0 new_flows_len 0 old_flows_len 0 qdisc noqueue 0: dev br-lan root refcnt 2=20 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)=20 backlog 0b 0p requeues 0 qdisc noqueue 0: dev eth0.1 root refcnt 2=20 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)=20 backlog 0b 0p requeues 0 qdisc cake 8005: dev eth0.2 root refcnt 2 bandwidth 22478Kbit diffserv4 = dual-srchost nat nowash ack-filter split-gso rtt 100.0ms raw overhead 0 = mpu 64=20 Sent 6943407136 bytes 35467722 pkt (dropped 51747, overlimits 3912091 = requeues 0)=20 backlog 0b 0p requeues 0 memory used: 843816b of 4Mb capacity estimate: 22478Kbit min/max network layer size: 42 / 1514 min/max overhead-adjusted size: 64 / 1514 average network hdr offset: 14 Bulk Best Effort Video Voice thresh 1404Kbit 22478Kbit 11239Kbit 5619Kbit target 12.9ms 5.0ms 5.0ms 5.0ms interval 107.9ms 100.0ms 100.0ms 100.0ms pk_delay 5.9ms 6.4ms 3.7ms 1.6ms av_delay 426us 445us 124us 188us sp_delay 13us 13us 12us 8us backlog 0b 0b 0b 0b pkts 3984407 30899121 474818 161123 bytes 789740113 5883832402 246917562 30556915 way_inds 65175 2580935 1064 5 way_miss 1427 918529 15960 1120 way_cols 0 0 0 0 drops 0 2966 511 7 marks 0 105 0 0 ack_drop 0 48263 0 0 sp_flows 2 4 1 0 bk_flows 0 0 0 0 un_flows 0 0 0 0 max_len 1035 43094 3094 590 quantum 300 685 342 300 qdisc ingress ffff: dev eth0.2 parent ffff:fff1 ----------------=20 Sent 43188461026 bytes 67870269 pkt (dropped 0, overlimits 0 requeues = 0)=20 backlog 0b 0p requeues 0 qdisc noqueue 0: dev br-guest root refcnt 2=20 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)=20 backlog 0b 0p requeues 0 qdisc noqueue 0: dev wlan1 root refcnt 2=20 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)=20 backlog 0b 0p requeues 0 qdisc noqueue 0: dev wlan0 root refcnt 2=20 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)=20 backlog 0b 0p requeues 0 qdisc noqueue 0: dev wlan0-1 root refcnt 2=20 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)=20 backlog 0b 0p requeues 0 qdisc noqueue 0: dev wlan1-1 root refcnt 2=20 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)=20 backlog 0b 0p requeues 0 qdisc cake 8006: dev ifb4eth0.2 root refcnt 2 bandwidth 289066Kbit = diffserv4 dual-dsthost nat nowash ingress no-ack-filter split-gso rtt = 100.0ms noatm overhead 18 mpu 64=20 Sent 44692280901 bytes 67864800 pkt (dropped 5472, overlimits 5572964 = requeues 0)=20 backlog 0b 0p requeues 0 memory used: 7346339b of 14453300b capacity estimate: 289066Kbit min/max network layer size: 46 / 1500 min/max overhead-adjusted size: 64 / 1518 average network hdr offset: 14 Bulk Best Effort Video Voice thresh 18066Kbit 289066Kbit 144533Kbit 72266Kbit target 5.0ms 5.0ms 5.0ms 5.0ms interval 100.0ms 100.0ms 100.0ms 100.0ms pk_delay 47us 740us 42us 18us av_delay 24us 32us 22us 9us sp_delay 14us 11us 12us 5us backlog 0b 0b 0b 0b pkts 1389 45323600 3704409 18840874 bytes 136046 43347299847 222296446 1130523693 way_inds 0 3016679 0 0 way_miss 17 903215 1053 2318 way_cols 0 0 0 0 drops 0 5471 0 1 marks 0 27 0 0 ack_drop 0 0 0 0 sp_flows 1 4 2 1 bk_flows 0 1 0 0 un_flows 0 0 0 0 max_len 98 68338 136 221 quantum 551 1514 1514 1514 root@IQrouter:~# uptime 15:07:37 up 6 days, 3:23, load average: 0.46, 0.21, 0.20 root@IQrouter:~#=20 Cheers, Jonathan > On Sep 1, 2020, at 12:18 PM, Sebastian Moeller = wrote: >=20 > HI Jonathan, >=20 >> On Sep 1, 2020, at 17:41, Jonathan Foulkes = wrote: >>=20 >> Toke, that link returns a 404 for me. >>=20 >> For others, I=E2=80=99ve found that testing cake throughput with = isolation options enabled is tricky if there are many competing = connections.=20 >=20 > Are you talking about the fact that with competing connections, = you only see the current isolation quantum's equivalent f the actual = rate? In that case maybe parse the "tc -s qdisc" output to get an idea = how much data/packets cake managed to push through in total in each = direction instead of relaying on the measured goodput? I am probably = barking up the wrong tree here... >=20 >> Like I keep having to tell my customers, fairness algorithms mean no = one device will ever gain 100% of the bandwidth so long as there are = other open & active connections from other devices. >=20 > That sounds like solid advice ;) Especially in the light of the = exceedingly useful "ingress" keyword, which under-load-will drop = depending on a flow's "unresponsiveness" such that more responsive flows = end up getting a somewhat bigger share of the post-cake throughput... >=20 >>=20 >> That said, I=E2=80=99d love to find options to increase throughput = for single-tin configs. >=20 > With or without isolation options? >=20 > Best Regards > Sebastian >=20 >>=20 >> Cheers, >>=20 >> Jonathan >>=20 >>> On Aug 31, 2020, at 7:35 AM, Toke H=C3=B8iland-J=C3=B8rgensen via = Bloat wrote: >>>=20 >>> Mikael Abrahamsson via Bloat writes: >>>=20 >>>> Hi, >>>>=20 >>>> I migrated to an APU2 (https://www.pcengines.ch/apu2.htm) as = residential=20 >>>> router, from my previous WRT1200AC (marvell armada 385). >>>>=20 >>>> I was running OpenWrt 18.06 on that one, now I am running latest = 19.07.3=20 >>>> on the APU2. >>>>=20 >>>> Before I had 500/100 and I had to use FQ_CODEL because CAKE took = too much=20 >>>> CPU to be able to do 500/100 on the WRT1200AC. Now I upgraded to = 1000/1000=20 >>>> and tried it again, and even the APU2 can only do CAKE up to ~300=20= >>>> megabit/s. With FQ_CODEL I get full speed (configure 900/900 in SQM = in=20 >>>> OpenWrt). >>>>=20 >>>> Looking in top, I see sirq% sitting at 50% pegged. This is typical = what I=20 >>>> see when CPU based forwarding is maxed out. =46rom my recollection = of=20 >>>> running CAKE on earlier versions of openwrt (17.x) I don't remember = CAKE=20 >>>> using more CPU than FQ_CODEL. >>>>=20 >>>> Anyone know what's up? I'm fine running FQ_CODEL, it solves any=20 >>>> bufferbloat but... I thought CAKE supposedly should use less CPU, = not=20 >>>> more? >>>=20 >>> Hmm, you say CAKE and FQ-Codel - so you're not enabling the shaper = (that >>> would be FQ-CoDel+HTB)? An exact config might be useful (or just the >>> output of tc -s qdisc). >>>=20 >>> If you are indeed not shaping, maybe you're hitting the issue fixed = by this commit? >>>=20 >>> = https://github.com/dtaht/sch_cake/commit/3152477235c934022049fcddc063c45d3= 7ec10e6n >>>=20 >>> -Toke >>> _______________________________________________ >>> Bloat mailing list >>> Bloat@lists.bufferbloat.net >>> https://lists.bufferbloat.net/listinfo/bloat >>=20 >> _______________________________________________ >> Bloat mailing list >> Bloat@lists.bufferbloat.net >> https://lists.bufferbloat.net/listinfo/bloat >=20 --Apple-Mail=_982A45C8-5EC4-4641-9D40-745E513310CB Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 Hi = Sebastian, Cake functions wonderfully, it=E2=80=99s a marvel in terms of = goodput.

My comment = was more oriented at the metrics process users use to evaluate results. = Only those who spend time analyzing just how busy an =E2=80=98idle=E2=80=99= network can be know that there are a lot of processes in constant = communications with their cloud services. 
The = challenge are the end users, who only understand the silly =E2=80=99speed=E2= =80=99 metric, and feel anything that lowers that number is a =E2=80=98bad= =E2=80=99 thing. It takes effort to get even technical users to get = it.
But even beyond the basic, the further cuts = induced by fairness is the new wrinkle in dealing with widely varying = speed test results with isolation enabled on a busy network.

The high density of = devices and constant chatter with cloud services means the average home = has way more devices and connections than many realize. Keep a note of = the number of =E2=80=98active connections=E2=80=99 displayed on the = OpenWRT overview page, you might be surprised (well, not you Seb ;) = )

As an = example, on my network, I average 1,000 active connections all day, it = rarely drops below 700. And it=E2=80=99s just two WFH professionals and = 60+ network devices, not all of which are active at any one = time.
I actually run some custom firewall rules to = de-prioritize four IoT devices that generate a LOT of traffic to their = services. Two of which power panel monitors with real-time updates. This = is why my bulk tin on egress has such high traffic.

Since you like to see tc = output, here=E2=80=99s the one from my system after nearly a = week.
I run four-layer Cake as we do a lot of Zoom = calls and our accounts are set up to do the appropriate DSCP = marking.

root@IQrouter:~# tc -s = qdisc
qdisc = noqueue 0: dev lo root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 = requeues 0) 
 backlog 0b 0p requeues 0
qdisc fq_codel 0: dev eth0 root refcnt 2 limit = 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms = memory_limit 4Mb ecn 
 Sent 51311363856 bytes 86785488 pkt (dropped 53, = overlimits 0 requeues 9114) 
 backlog 0b 0p requeues 9114
  maxpacket 12112 drop_overlimit 0 = new_flow_count 691740 ecn_mark 0
  new_flows_len 0 old_flows_len = 0
qdisc = noqueue 0: dev br-lan root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 = requeues 0) 
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev eth0.1 root refcnt = 2 
 Sent = 0 bytes 0 pkt (dropped 0, overlimits 0 requeues = 0) 
 backlog 0b 0p requeues 0
qdisc cake 8005: dev eth0.2 root refcnt 2 = bandwidth 22478Kbit diffserv4 dual-srchost nat nowash ack-filter = split-gso rtt 100.0ms raw overhead 0 mpu = 64 
 Sent = 6943407136 bytes 35467722 pkt (dropped 51747, overlimits 3912091 = requeues 0) 
 backlog 0b 0p requeues 0
 memory used: 843816b of = 4Mb
 capacity estimate: 22478Kbit
 min/max network layer size:     =       42 /    1514
 min/max overhead-adjusted size:   =     64 /    1514
 average network hdr offset:     =       14

                 =   Bulk  Best Effort        Video  =       Voice
  thresh       1404Kbit    = 22478Kbit    11239Kbit     = 5619Kbit
  = target         12.9ms        = 5.0ms        5.0ms        = 5.0ms
  = interval      107.9ms      100.0ms  =     100.0ms      = 100.0ms
  = pk_delay        5.9ms        = 6.4ms        3.7ms        = 1.6ms
  = av_delay        426us        = 445us        124us        = 188us
  = sp_delay         13us         = 13us         12us          = 8us
  = backlog            0b       =     0b           0b     =       0b
  pkts          3984407   =   30899121       474818       = 161123
  = bytes       789740113   5883832402    = 246917562     30556915
  way_inds        65175    =   2580935         1064      =       5
  way_miss         1427   =     918529        15960     =     1120
  way_cols            = 0            0        =     0            = 0
  = drops               0     =     2966          511    =         7
  marks               = 0          105        =     0            = 0
  = ack_drop            0      =   48263            0    =         0
  sp_flows            = 2            4        =     1            = 0
  = bk_flows            0      =       0            0  =           0
  un_flows          =   0            0      =       0            = 0
  = max_len          1035      =   43094         3094        =   590
  = quantum           300      =     685          342    =       300

qdisc ingress ffff: dev eth0.2 parent ffff:fff1 = ---------------- 
 Sent 43188461026 bytes 67870269 pkt (dropped 0, = overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev br-guest root refcnt = 2 
 Sent = 0 bytes 0 pkt (dropped 0, overlimits 0 requeues = 0) 
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev wlan1 root refcnt = 2 
 Sent = 0 bytes 0 pkt (dropped 0, overlimits 0 requeues = 0) 
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev wlan0 root refcnt = 2 
 Sent = 0 bytes 0 pkt (dropped 0, overlimits 0 requeues = 0) 
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev wlan0-1 root refcnt = 2 
 Sent = 0 bytes 0 pkt (dropped 0, overlimits 0 requeues = 0) 
 backlog 0b 0p requeues 0
qdisc noqueue 0: dev wlan1-1 root refcnt = 2 
 Sent = 0 bytes 0 pkt (dropped 0, overlimits 0 requeues = 0) 
 backlog 0b 0p requeues 0
qdisc cake 8006: dev ifb4eth0.2 root refcnt 2 = bandwidth 289066Kbit diffserv4 dual-dsthost nat nowash ingress = no-ack-filter split-gso rtt 100.0ms noatm overhead 18 mpu = 64 
 Sent = 44692280901 bytes 67864800 pkt (dropped 5472, overlimits 5572964 = requeues 0) 
 backlog 0b 0p requeues 0
 memory used: 7346339b of = 14453300b
 capacity estimate: 289066Kbit
 min/max network layer size:     =       46 /    1500
 min/max overhead-adjusted size:   =     64 /    1518
 average network hdr offset:     =       14

                 =   Bulk  Best Effort        Video  =       Voice
  thresh      18066Kbit   289066Kbit =   144533Kbit    72266Kbit
  target          = 5.0ms        5.0ms        = 5.0ms        5.0ms
  interval      100.0ms  =     100.0ms      100.0ms      = 100.0ms
  = pk_delay         47us        = 740us         42us         = 18us
  = av_delay         24us         = 32us         22us          = 9us
  = sp_delay         14us         = 11us         12us          = 5us
  = backlog            0b       =     0b           0b     =       0b
  pkts             1389 =     45323600      3704409     = 18840874
  = bytes          136046  43347299847  =   222296446   1130523693
  way_inds          =   0      3016679          =   0            = 0
  = way_miss           17       = 903215         1053         = 2318
  = way_cols            0      =       0            0  =           0
  drops           =     0         5471      =       0            = 1
  = marks               0     =       27            0  =           0
  ack_drop          =   0            0      =       0            = 0
  = sp_flows            1      =       4            2  =           1
  bk_flows          =   0            1      =       0            = 0
  = un_flows            0      =       0            0  =           0
  max_len          =   98        68338        =   136          221
  quantum           = 551         1514         1514 =         1514

root@IQrouter:~# uptime
 15:07:37 up 6 days,  3:23,  load = average: 0.46, 0.21, 0.20
root@IQrouter:~# 


Cheers,

Jonathan

On Sep 1, 2020, at 12:18 PM, = Sebastian Moeller <moeller0@gmx.de> wrote:

HI = Jonathan,

On Sep 1, 2020, at 17:41, Jonathan Foulkes <jf@jonathanfoulkes.com> wrote:

Toke, that link returns a 404 for me.

For others, I=E2=80=99ve found that testing cake throughput = with isolation options enabled is tricky if there are many competing = connections.

Are you = talking about the fact that with competing connections, you only see the = current isolation quantum's equivalent f the actual rate? In that case = maybe parse the "tc -s qdisc" output to get an idea how much = data/packets cake managed to push through in total in each direction = instead of relaying on the measured goodput? I am probably barking up = the wrong tree here...

Like I keep having to tell my customers, = fairness algorithms mean no one device will ever gain 100% of the = bandwidth so long as there are other open & active connections from = other devices.

That = sounds like solid advice ;) Especially in the light of the exceedingly = useful "ingress" keyword, which under-load-will drop depending on a = flow's "unresponsiveness" such that more responsive flows end up getting = a somewhat bigger share of the post-cake throughput...


That = said, I=E2=80=99d love to find options to increase throughput for = single-tin configs.

With or = without isolation options?

Best Regards
= Sebastian


Cheers,

Jonathan
On Aug = 31, 2020, at 7:35 AM, Toke H=C3=B8iland-J=C3=B8rgensen via Bloat <bloat@lists.bufferbloat.net> wrote:

Mikael Abrahamsson via Bloat <bloat@lists.bufferbloat.net> writes:

Hi,

I migrated to an APU2 (https://www.pcengines.ch/apu2.htm) as residential
router, from my previous WRT1200AC (marvell armada 385).

I was running OpenWrt 18.06 on that one, now I = am running latest 19.07.3
on the APU2.

Before I had 500/100 and I had to use FQ_CODEL because CAKE = took too much
CPU to be able to do 500/100 on the = WRT1200AC. Now I upgraded to 1000/1000
and tried it = again, and even the APU2 can only do CAKE up to ~300
megabit/s. With FQ_CODEL I get full speed (configure 900/900 = in SQM in
OpenWrt).

Looking = in top, I see sirq% sitting at 50% pegged. This is typical what I
see when CPU based forwarding is maxed out. =46rom my = recollection of
running CAKE on earlier versions of = openwrt (17.x) I don't remember CAKE
using more CPU than = FQ_CODEL.

Anyone know what's up? I'm fine = running FQ_CODEL, it solves any
bufferbloat but... I = thought CAKE supposedly should use less CPU, not
more?

Hmm, you say CAKE and FQ-Codel - = so you're not enabling the shaper (that
would be = FQ-CoDel+HTB)? An exact config might be useful (or just the
output of tc -s qdisc).

If you = are indeed not shaping, maybe you're hitting the issue fixed by this = commit?

https://github.com/dtaht/sch_cake/commit/3152477235c934022049fc= ddc063c45d37ec10e6n

-Toke
_______________________________________________
Bloat mailing list
Bloat@lists.bufferbloat.nethttps://lists.bufferbloat.net/listinfo/bloat

_______________________________________________
Bloat mailing list
Bloat@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/bloat


= --Apple-Mail=_982A45C8-5EC4-4641-9D40-745E513310CB--