<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Jul 6, 2021 at 7:26 PM Dave Taht <<a href="mailto:dave.taht@gmail.com">dave.taht@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">On Tue, Jul 6, 2021 at 3:32 PM Aaron Wood <<a href="mailto:woody77@gmail.com" target="_blank">woody77@gmail.com</a>> wrote:<br>

><br>

> I'm running an Odyssey from Seeed Studios (celeron J4125 with dual i211), and it can handle Cake at 1Gbps on a single core (which it needs to, because OpenWRT's i211 support still has multiple receive queues disabled).<br>

<br>

Not clear if that is shaped or not? Line rate is easy on processors of<br>

that class or better, but shaped?<br></blockquote><div><br></div><div>That's shaped.  I can shape 800+, and the kernel ramps the clock rate up to 2.5GHz as needed, IIRC.  I'm guessing that it might thermally limit at some point, but I haven't had sustained >500Mbps traffic for long enough to really exercise that.  Although the covid WFH and has definitely increased the likelihood that I'm hitting >500Mbps downloads.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

some points:<br>

<br>

On inbound shaping especially it it still best to lock network traffic<br>

to a single core in low end platforms.<br>

<br>

Cake itself is not multicore, although the design essentially is. We<br>

did some work towards trying to make it shape across multiple cores<br>

and multiple hardware queues. IF the locking contention could be<br>

minimized (RCU) I felt it possible for a win here, but a bigger win<br>

would be to eliminate "mirred" from the ingress path entirely.<br></blockquote><div><br></div><div>I was going to play around with shaping to lower levels across multiple cores, as many of the loads I deal with are multi-stream, but I always worry about the ack path, as the provisioned rates are so asymmetric (35Mbps up).  I'm using `ack-filter-aggressive` on egress to help.  I've found that the most aggressive ack filtering seems to hurt throughput.</div><div><br></div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

<br>

Even multiple transmit queues remains kind of dicy in linux, and<br>

actually tend to slow network processing in most cases I've tried at<br>

gbit line rates. They also add latency, as (1) BQL is MIAD, not AIMD,<br>

so it stays "stuck" at a "good" level for a long time, AND 2) each hw<br>

queue gets an additive fifo at this layer, so where, you might need<br>

only 40k to keep a single hw queue busy, you end up with 160k with 4<br>

hw queues. This problem is getting worse and worse (64 queues are<br>

common in newer hardware, 1000s in really new hardware) and a revisit<br>

to how BQL does things in this case would be useful. Ideally it would<br>

share state (with a cross core variable and atomic locks) as to how<br>

much total buffering was actually needed "down there" across all the<br>

queues, but without trying it, I worry that that would end up costing<br>

a lot of cpu cycles.<br>

<br>

Feel free to experiment with multiple transmit queues locked to other<br>

cores with the set-affinity bits in /proc/interrupts. I'm sure these<br>

MUST be useful on some platform, but I think most of the use for<br>

multiple hw queues is when a locally processing application  is<br>

getting the data, not when it is being routed.<br>

<br>

Ironically, I guess, the shorter your queues the higher likelihood a<br>

given packet will remain in l2 or even l1 cache.<br></blockquote><div><br></div><div>I'm pinning all the queues to cores.  Although I've pinned rx/tx for the same interface to the same cores, with cores 0-1 doing LAN and 2-3 doing WAN duties...  I may try matching flow directions per core (rx WAN and tx LAN on the same core).  </div><div><br></div><div>One separate reason to set affinity on startup is that the reshuffling that the kernel tries to do will cause things to stumble as the caches all miss.</div><div><br></div><div>The note about BQL is interesting...  Is that actually configurable (I haven't gone looking, before).</div><div><br></div><div>OTOH, I've hit a point where trying to squeeze the most out of it just doesn't seem necessary.  When I was bench-testing it (with local traffic generation), I could saturate wire rates in both directions with cake running, and limiting.  So...  Not much of a worry there.  But it's still inconsistent on live traffic and with a real internet.  I'm not sure if that is due to the dynamic frequency scaling, or just congestion at the head-end, or what.</div><div><br></div><div>I was going to start a separate thread, but I've been contemplating what measurements and stats I can long-term monitor to understand the intermittent stumbles and hangs that I see.  I'm fairly certain that they're in the "It can't be DNS....  ::sigh:: It's always DNS...." category, though.  And if that's the case, I should just log all the queries and look at the response times.  It seems to be marginally better with dns-over-https (doing happy-eyeballs-like concurrent requests across google and cloudflare), but I can't be certain.</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">

I<br>

><br>

> On Tue, Jun 22, 2021 at 12:44 AM Giuseppe De Luca <<a href="mailto:dropheaders@gmx.com" target="_blank">dropheaders@gmx.com</a>> wrote:<br>

>><br>

>> Also a PC Engines APU4 will do the job<br>

>> (<a href="https://inonius.net/results/?userId=17996087f5e8" rel="noreferrer" target="_blank">https://inonius.net/results/?userId=17996087f5e8</a> - this is a<br>

>> 1gbit/1gbit, with Openwrt/sqm-scripts set to 900/900.  ISP is Sony NURO<br>

>> in Japan). Will follow this thread to know if some interesting device<br>

>> popup :)<br>

>><br>

>><br>

>> <a href="https://inonius.net/results/?userId=17996087f5e8" rel="noreferrer" target="_blank">https://inonius.net/results/?userId=17996087f5e8</a><br>

>><br>

>> On 6/22/2021 6:12 AM, Sebastian Moeller wrote:<br>

>> ><br>

>> > On 22 June 2021 06:00:48 CEST, Stephen Hemminger <<a href="mailto:stephen@networkplumber.org" target="_blank">stephen@networkplumber.org</a>> wrote:<br>

>> >> Is there any consumer hardware that can actually keep up and do AQM at<br>

>> >> 1Gbit.<br>

>> >          Over in the OpenWrt forums the same question pops up routinely once per week. The best answer ATM seems to be a combination of a raspberry pi4B with a decent USB3 gigabit ethernet dongle, a managed switch and any capable (OpenWrt) AP of the user's liking. With 4 arm A72 cores the will traffic shape up to a gigabit as reported by multiple users.<br>

>> ><br>

>> ><br>

>> >> It seems everyone seems obsessed with gamer Wifi 6. But can only do<br>

>> >> 300Mbit single<br>

>> >> stream with any kind of QoS.<br>

>> > IIUC most commercial home routers/APs bet on offload engines to do most of the heavy lifting, but as far as I understand only the NSS cores have a shaper and fq_codel module....<br>

>> ><br>

>> ><br>

>> >> It doesn't help that all the local ISP's claim 10Mbit upload even with<br>

>> >> 1G download.<br>

>> >> Is this a head end provisioning problem or related to Docsis 3.0 (or<br>

>> >> later) modems?<br>

>> > For DOCSIS the issue seems to be an unfortunate frequency split between up and downstream and use of lower efficiency coding schemes .<br>

>> > Over here the incumbent cable isp provisions  fifty Mbps for upstream and plans to increase that to hundred once the upstream is switched to docsis 3.1.<br>

>> > I believe one issue is that since most of the upstream is required for the reverse ACK traffic for the download and hence it can not be oversubscribed too much.... but I think we have real docsis experts on the list, so I will stop my speculation here...<br>

>> ><br>

>> > Regards<br>

>> >           Sebastian<br>

>> ><br>

>> ><br>

>> ><br>

>> ><br>

>> >> _______________________________________________<br>

>> >> Bloat mailing list<br>

>> >> <a href="mailto:Bloat@lists.bufferbloat.net" target="_blank">Bloat@lists.bufferbloat.net</a><br>

>> >> <a href="https://lists.bufferbloat.net/listinfo/bloat" rel="noreferrer" target="_blank">https://lists.bufferbloat.net/listinfo/bloat</a><br>

>> _______________________________________________<br>

>> Bloat mailing list<br>

>> <a href="mailto:Bloat@lists.bufferbloat.net" target="_blank">Bloat@lists.bufferbloat.net</a><br>

>> <a href="https://lists.bufferbloat.net/listinfo/bloat" rel="noreferrer" target="_blank">https://lists.bufferbloat.net/listinfo/bloat</a><br>

><br>

> _______________________________________________<br>

> Bloat mailing list<br>

> <a href="mailto:Bloat@lists.bufferbloat.net" target="_blank">Bloat@lists.bufferbloat.net</a><br>

> <a href="https://lists.bufferbloat.net/listinfo/bloat" rel="noreferrer" target="_blank">https://lists.bufferbloat.net/listinfo/bloat</a><br>

<br>

<br>

<br>

-- <br>

Latest Podcast:<br>

<a href="https://www.linkedin.com/feed/update/urn:li:activity:6791014284936785920/" rel="noreferrer" target="_blank">https://www.linkedin.com/feed/update/urn:li:activity:6791014284936785920/</a><br>

<br>

Dave Täht CTO, TekLibre, LLC<br>

</blockquote></div></div>