From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dpreed@deepplum.com>
Received: from smtp123.iad3a.emailsrvr.com (smtp123.iad3a.emailsrvr.com
 [173.203.187.123])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by lists.bufferbloat.net (Postfix) with ESMTPS id 8139E3B2A4
 for <bloat@lists.bufferbloat.net>; Thu,  8 Jul 2021 15:56:10 -0400 (EDT)
Received: from app62.wa-webapps.iad3a (relay-webapps.rsapps.net
 [172.27.255.140])
 by smtp32.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id 188285785;
 Thu,  8 Jul 2021 15:56:10 -0400 (EDT)
Received: from deepplum.com (localhost.localdomain [127.0.0.1])
 by app62.wa-webapps.iad3a (Postfix) with ESMTP id 9766C60046;
 Thu,  8 Jul 2021 15:56:25 -0400 (EDT)
Received: by apps.rackspace.com
 (Authenticated sender: dpreed@deepplum.com, from: dpreed@deepplum.com) 
 with HTTP; Thu, 8 Jul 2021 15:56:25 -0400 (EDT)
X-Auth-ID: dpreed@deepplum.com
Date: Thu, 8 Jul 2021 15:56:25 -0400 (EDT)
From: "David P. Reed" <dpreed@deepplum.com>
To: "Dave Taht" <dave.taht@gmail.com>
Cc: "Aaron Wood" <woody77@gmail.com>, "Cake List" <cake@lists.bufferbloat.net>,
 "Giuseppe De Luca" <dropheaders@gmx.com>,
 "bloat" <bloat@lists.bufferbloat.net>
MIME-Version: 1.0
Content-Type: multipart/alternative;
 boundary="----=_20210708155625000000_68504"
Importance: Normal
X-Priority: 3 (Normal)
X-Type: html
In-Reply-To: <CAA93jw4B70qXxKyQ9QorPHsMFzoLtkrxJzyAWHHoicTEepJQOw@mail.gmail.com>
References: <20210621210048.628befdb@hermes.local> 
 <38CC4C4D-AE42-4629-8472-16BCC0DEAFEA@gmx.de> 
 <2dbdf457-5652-6b74-7014-3bf79dde6bc9@gmx.com> 
 <CALQXh-OwnqcFBhx+uy9_83eHF3Xh3iAsNkDyFN+TOH_KJBTVvg@mail.gmail.com> 
 <CAA93jw4B70qXxKyQ9QorPHsMFzoLtkrxJzyAWHHoicTEepJQOw@mail.gmail.com>
X-Client-IP: 209.6.168.128
Message-ID: <1625774185.6179784@apps.rackspace.com>
X-Mailer: webmail/19.0.7-RC
X-Classification-ID: 65e556de-f4ae-42e1-8eca-5b6645b3f01e-1-1
Subject: Re: [Bloat] =?utf-8?q?=5BCake=5D__Really_getting_1G_out_of_ISP=3F?=
X-BeenThere: bloat@lists.bufferbloat.net
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: General list for discussing Bufferbloat <bloat.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/bloat>,
 <mailto:bloat-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/bloat>
List-Post: <mailto:bloat@lists.bufferbloat.net>
List-Help: <mailto:bloat-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/bloat>,
 <mailto:bloat-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Thu, 08 Jul 2021 19:56:10 -0000

------=_20210708155625000000_68504
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

=0AAs a data point, I run Cake on a "Intel(R) Celeron(R) CPU  N2930  @ 1.83=
GHz" with 2 cores, and 1 GB/sec cable modem network. My "router board" has =
two GigE ports, doesn't have WiFi. It uses Fedora 34 Server as its basis, r=
uns dnsmasq for the main LAN serving DNS, DHCP, and running a Hurricane Ele=
ctric /56 tunnel for v6.=0A =0ADoing testing with RRUL or various high-end =
web speed tests, I get full 1 GHz (usually >950 Mb/s throughput) download p=
erformance through it, and minimal bufferbloat (A+ on the speed tests that =
measure bufferbloat).. I also get full upload speed with no bufferbloat. =
=0A =0AThis, I believe, is a much slower board, with fewer cores, than the =
Odyssey. It never comes close to saturating one of the cores.=0A =0AI long =
ago gave up on trying to reflash consumer WiFi routers to serve as home gat=
eway. (and now that cpus and memory are incredibly cheap, the proper archit=
ecture is not to bundle two unrelated functions into a single processor any=
way, just have two boxes for the two functions)=0A =0AI do use them inside =
my premises as APs. Life is too short. As APs, they are limited by the damn=
 WiFi chipsets and drivers, with their poor packet scheduling, which is not=
 solved by Cake. That's a WiFi layer problem of queuing and scheduling in t=
he MAC layer, and I think the WiFi chip vendors have been clueless for at l=
east a decade, and show no sign of getting a clue, sad to say. They live in=
 proprietary land, and really have no interest in fixing the MAC layer as l=
ong as they can claim extreme throughput in an artificial scenario between =
two points with no cross traffic.=0A =0A =0AOn Tuesday, July 6, 2021 10:26p=
m, "Dave Taht" <dave.taht@gmail.com> said:=0A=0A=0A=0A> On Tue, Jul 6, 2021=
 at 3:32 PM Aaron Wood <woody77@gmail.com> wrote:=0A> >=0A> > I'm running a=
n Odyssey from Seeed Studios (celeron J4125 with dual i211), and=0A> it can=
 handle Cake at 1Gbps on a single core (which it needs to, because OpenWRT'=
s=0A> i211 support still has multiple receive queues disabled).=0A> =0A> No=
t clear if that is shaped or not? Line rate is easy on processors of=0A> th=
at class or better, but shaped?=0A> =0A> some points:=0A> =0A> On inbound s=
haping especially it it still best to lock network traffic=0A> to a single =
core in low end platforms.=0A> =0A> Cake itself is not multicore, although =
the design essentially is. We=0A> did some work towards trying to make it s=
hape across multiple cores=0A> and multiple hardware queues. IF the locking=
 contention could be=0A> minimized (RCU) I felt it possible for a win here,=
 but a bigger win=0A> would be to eliminate "mirred" from the ingress path =
entirely.=0A> =0A> Even multiple transmit queues remains kind of dicy in li=
nux, and=0A> actually tend to slow network processing in most cases I've tr=
ied at=0A> gbit line rates. They also add latency, as (1) BQL is MIAD, not =
AIMD,=0A> so it stays "stuck" at a "good" level for a long time, AND 2) eac=
h hw=0A> queue gets an additive fifo at this layer, so where, you might nee=
d=0A> only 40k to keep a single hw queue busy, you end up with 160k with 4=
=0A> hw queues. This problem is getting worse and worse (64 queues are=0A> =
common in newer hardware, 1000s in really new hardware) and a revisit=0A> t=
o how BQL does things in this case would be useful. Ideally it would=0A> sh=
are state (with a cross core variable and atomic locks) as to how=0A> much =
total buffering was actually needed "down there" across all the=0A> queues,=
 but without trying it, I worry that that would end up costing=0A> a lot of=
 cpu cycles.=0A> =0A> Feel free to experiment with multiple transmit queues=
 locked to other=0A> cores with the set-affinity bits in /proc/interrupts. =
I'm sure these=0A> MUST be useful on some platform, but I think most of the=
 use for=0A> multiple hw queues is when a locally processing application is=
=0A> getting the data, not when it is being routed.=0A> =0A> Ironically, I =
guess, the shorter your queues the higher likelihood a=0A> given packet wil=
l remain in l2 or even l1 cache.=0A> =0A> I=0A> >=0A> > On Tue, Jun 22, 202=
1 at 12:44 AM Giuseppe De Luca <dropheaders@gmx.com>=0A> wrote:=0A> >>=0A> =
>> Also a PC Engines APU4 will do the job=0A> >> (https://inonius.net/resul=
ts/?userId=3D17996087f5e8 - this is a=0A> >> 1gbit/1gbit, with Openwrt/sqm-=
scripts set to 900/900. ISP is Sony NURO=0A> >> in Japan). Will follow this=
 thread to know if some interesting device=0A> >> popup :)=0A> >>=0A> >>=0A=
> >> https://inonius.net/results/?userId=3D17996087f5e8=0A> >>=0A> >> On 6/=
22/2021 6:12 AM, Sebastian Moeller wrote:=0A> >> >=0A> >> > On 22 June 2021=
 06:00:48 CEST, Stephen Hemminger=0A> <stephen@networkplumber.org> wrote:=
=0A> >> >> Is there any consumer hardware that can actually keep up and do=
=0A> AQM at=0A> >> >> 1Gbit.=0A> >> > Over in the OpenWrt forums the same q=
uestion pops up=0A> routinely once per week. The best answer ATM seems to b=
e a combination of a=0A> raspberry pi4B with a decent USB3 gigabit ethernet=
 dongle, a managed switch and=0A> any capable (OpenWrt) AP of the user's li=
king. With 4 arm A72 cores the will=0A> traffic shape up to a gigabit as re=
ported by multiple users.=0A> >> >=0A> >> >=0A> >> >> It seems everyone see=
ms obsessed with gamer Wifi 6. But can only=0A> do=0A> >> >> 300Mbit single=
=0A> >> >> stream with any kind of QoS.=0A> >> > IIUC most commercial home =
routers/APs bet on offload engines to do=0A> most of the heavy lifting, but=
 as far as I understand only the NSS cores have a=0A> shaper and fq_codel m=
odule....=0A> >> >=0A> >> >=0A> >> >> It doesn't help that all the local IS=
P's claim 10Mbit upload=0A> even with=0A> >> >> 1G download.=0A> >> >> Is t=
his a head end provisioning problem or related to Docsis 3.0=0A> (or=0A> >>=
 >> later) modems?=0A> >> > For DOCSIS the issue seems to be an unfortunate=
 frequency split=0A> between up and downstream and use of lower efficiency =
coding schemes .=0A> >> > Over here the incumbent cable isp provisions fift=
y Mbps for=0A> upstream and plans to increase that to hundred once the upst=
ream is switched to=0A> docsis 3.1.=0A> >> > I believe one issue is that si=
nce most of the upstream is required=0A> for the reverse ACK traffic for th=
e download and hence it can not be=0A> oversubscribed too much.... but I th=
ink we have real docsis experts on the list,=0A> so I will stop my speculat=
ion here...=0A> >> >=0A> >> > Regards=0A> >> > Sebastian=0A> >> >=0A> >> >=
=0A> >> >=0A> >> >=0A> >> >> ______________________________________________=
_=0A> >> >> Bloat mailing list=0A> >> >> Bloat@lists.bufferbloat.net=0A> >>=
 >> https://lists.bufferbloat.net/listinfo/bloat=0A> >> ___________________=
____________________________=0A> >> Bloat mailing list=0A> >> Bloat@lists.b=
ufferbloat.net=0A> >> https://lists.bufferbloat.net/listinfo/bloat=0A> >=0A=
> > _______________________________________________=0A> > Bloat mailing lis=
t=0A> > Bloat@lists.bufferbloat.net=0A> > https://lists.bufferbloat.net/lis=
tinfo/bloat=0A> =0A> =0A> =0A> --=0A> Latest Podcast:=0A> https://www.linke=
din.com/feed/update/urn:li:activity:6791014284936785920/=0A> =0A> Dave T=C3=
=A4ht CTO, TekLibre, LLC=0A> ______________________________________________=
_=0A> Cake mailing list=0A> Cake@lists.bufferbloat.net=0A> https://lists.bu=
fferbloat.net/listinfo/cake=0A> 
------=_20210708155625000000_68504
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<font face=3D"arial" size=3D"2"><p style=3D"margin:0;padding:0;font-family:=
 arial; font-size: 10pt; overflow-wrap: break-word;">As a data point, I run=
 Cake on a "Intel(R) Celeron(R) CPU&nbsp; N2930&nbsp; @ 1.83GHz" with 2 cor=
es, and 1 GB/sec cable modem network. My "router board" has two GigE ports,=
 doesn't have WiFi. It uses Fedora 34 Server as its basis, runs dnsmasq for=
 the main LAN serving DNS, DHCP, and running a Hurricane Electric /56 tunne=
l for v6.</p>=0A<p style=3D"margin:0;padding:0;font-family: arial; font-siz=
e: 10pt; overflow-wrap: break-word;">&nbsp;</p>=0A<p style=3D"margin:0;padd=
ing:0;font-family: arial; font-size: 10pt; overflow-wrap: break-word;">Doin=
g testing with RRUL or various high-end web speed tests, I get full 1 GHz (=
usually &gt;950 Mb/s throughput) download performance through it, and minim=
al bufferbloat (A+ on the speed tests that measure bufferbloat).. I also ge=
t full upload speed with no bufferbloat.&nbsp;</p>=0A<p style=3D"margin:0;p=
adding:0;font-family: arial; font-size: 10pt; overflow-wrap: break-word;">&=
nbsp;</p>=0A<p style=3D"margin:0;padding:0;font-family: arial; font-size: 1=
0pt; overflow-wrap: break-word;">This, I believe, is a much slower board, w=
ith fewer cores, than the Odyssey. It never comes close to saturating one o=
f the cores.</p>=0A<p style=3D"margin:0;padding:0;font-family: arial; font-=
size: 10pt; overflow-wrap: break-word;">&nbsp;</p>=0A<p style=3D"margin:0;p=
adding:0;font-family: arial; font-size: 10pt; overflow-wrap: break-word;">I=
 long ago gave up on trying to reflash consumer WiFi routers to serve as ho=
me gateway. (and now that cpus and memory are incredibly cheap, the proper =
architecture is not to bundle two unrelated functions into a single process=
or anyway, just have two boxes for the two functions)</p>=0A<p style=3D"mar=
gin:0;padding:0;font-family: arial; font-size: 10pt; overflow-wrap: break-w=
ord;">&nbsp;</p>=0A<p style=3D"margin:0;padding:0;font-family: arial; font-=
size: 10pt; overflow-wrap: break-word;">I do use them inside my premises as=
 APs. Life is too short. As APs, they are limited by the damn WiFi chipsets=
 and drivers, with their poor packet scheduling, which is not solved by Cak=
e. That's a WiFi layer problem of queuing and scheduling in the MAC layer, =
and I think the WiFi chip vendors have been clueless for at least a decade,=
 and show no sign of getting a clue, sad to say. They live in proprietary l=
and, and really have no interest in fixing the MAC layer as long as they ca=
n claim extreme throughput in an artificial scenario between two points wit=
h no cross traffic.</p>=0A<p style=3D"margin:0;padding:0;font-family: arial=
; font-size: 10pt; overflow-wrap: break-word;">&nbsp;</p>=0A<p style=3D"mar=
gin:0;padding:0;font-family: arial; font-size: 10pt; overflow-wrap: break-w=
ord;">&nbsp;</p>=0A<p style=3D"margin:0;padding:0;font-family: arial; font-=
size: 10pt; overflow-wrap: break-word;">On Tuesday, July 6, 2021 10:26pm, "=
Dave Taht" &lt;dave.taht@gmail.com&gt; said:<br /><br /></p>=0A<div id=3D"S=
afeStyles1625773122">=0A<p style=3D"margin:0;padding:0;font-family: arial; =
font-size: 10pt; overflow-wrap: break-word;">&gt; On Tue, Jul 6, 2021 at 3:=
32 PM Aaron Wood &lt;woody77@gmail.com&gt; wrote:<br />&gt; &gt;<br />&gt; =
&gt; I'm running an Odyssey from Seeed Studios (celeron J4125 with dual i21=
1), and<br />&gt; it can handle Cake at 1Gbps on a single core (which it ne=
eds to, because OpenWRT's<br />&gt; i211 support still has multiple receive=
 queues disabled).<br />&gt; <br />&gt; Not clear if that is shaped or not?=
 Line rate is easy on processors of<br />&gt; that class or better, but sha=
ped?<br />&gt; <br />&gt; some points:<br />&gt; <br />&gt; On inbound shap=
ing especially it it still best to lock network traffic<br />&gt; to a sing=
le core in low end platforms.<br />&gt; <br />&gt; Cake itself is not multi=
core, although the design essentially is. We<br />&gt; did some work toward=
s trying to make it shape across multiple cores<br />&gt; and multiple hard=
ware queues. IF the locking contention could be<br />&gt; minimized (RCU) I=
 felt it possible for a win here, but a bigger win<br />&gt; would be to el=
iminate "mirred" from the ingress path entirely.<br />&gt; <br />&gt; Even =
multiple transmit queues remains kind of dicy in linux, and<br />&gt; actua=
lly tend to slow network processing in most cases I've tried at<br />&gt; g=
bit line rates. They also add latency, as (1) BQL is MIAD, not AIMD,<br />&=
gt; so it stays "stuck" at a "good" level for a long time, AND 2) each hw<b=
r />&gt; queue gets an additive fifo at this layer, so where, you might nee=
d<br />&gt; only 40k to keep a single hw queue busy, you end up with 160k w=
ith 4<br />&gt; hw queues. This problem is getting worse and worse (64 queu=
es are<br />&gt; common in newer hardware, 1000s in really new hardware) an=
d a revisit<br />&gt; to how BQL does things in this case would be useful. =
Ideally it would<br />&gt; share state (with a cross core variable and atom=
ic locks) as to how<br />&gt; much total buffering was actually needed "dow=
n there" across all the<br />&gt; queues, but without trying it, I worry th=
at that would end up costing<br />&gt; a lot of cpu cycles.<br />&gt; <br /=
>&gt; Feel free to experiment with multiple transmit queues locked to other=
<br />&gt; cores with the set-affinity bits in /proc/interrupts. I'm sure t=
hese<br />&gt; MUST be useful on some platform, but I think most of the use=
 for<br />&gt; multiple hw queues is when a locally processing application =
is<br />&gt; getting the data, not when it is being routed.<br />&gt; <br /=
>&gt; Ironically, I guess, the shorter your queues the higher likelihood a<=
br />&gt; given packet will remain in l2 or even l1 cache.<br />&gt; <br />=
&gt; I<br />&gt; &gt;<br />&gt; &gt; On Tue, Jun 22, 2021 at 12:44 AM Giuse=
ppe De Luca &lt;dropheaders@gmx.com&gt;<br />&gt; wrote:<br />&gt; &gt;&gt;=
<br />&gt; &gt;&gt; Also a PC Engines APU4 will do the job<br />&gt; &gt;&g=
t; (https://inonius.net/results/?userId=3D17996087f5e8 - this is a<br />&gt=
; &gt;&gt; 1gbit/1gbit, with Openwrt/sqm-scripts set to 900/900. ISP is Son=
y NURO<br />&gt; &gt;&gt; in Japan). Will follow this thread to know if som=
e interesting device<br />&gt; &gt;&gt; popup :)<br />&gt; &gt;&gt;<br />&g=
t; &gt;&gt;<br />&gt; &gt;&gt; https://inonius.net/results/?userId=3D179960=
87f5e8<br />&gt; &gt;&gt;<br />&gt; &gt;&gt; On 6/22/2021 6:12 AM, Sebastia=
n Moeller wrote:<br />&gt; &gt;&gt; &gt;<br />&gt; &gt;&gt; &gt; On 22 June=
 2021 06:00:48 CEST, Stephen Hemminger<br />&gt; &lt;stephen@networkplumber=
.org&gt; wrote:<br />&gt; &gt;&gt; &gt;&gt; Is there any consumer hardware =
that can actually keep up and do<br />&gt; AQM at<br />&gt; &gt;&gt; &gt;&g=
t; 1Gbit.<br />&gt; &gt;&gt; &gt; Over in the OpenWrt forums the same quest=
ion pops up<br />&gt; routinely once per week. The best answer ATM seems to=
 be a combination of a<br />&gt; raspberry pi4B with a decent USB3 gigabit =
ethernet dongle, a managed switch and<br />&gt; any capable (OpenWrt) AP of=
 the user's liking. With 4 arm A72 cores the will<br />&gt; traffic shape u=
p to a gigabit as reported by multiple users.<br />&gt; &gt;&gt; &gt;<br />=
&gt; &gt;&gt; &gt;<br />&gt; &gt;&gt; &gt;&gt; It seems everyone seems obse=
ssed with gamer Wifi 6. But can only<br />&gt; do<br />&gt; &gt;&gt; &gt;&g=
t; 300Mbit single<br />&gt; &gt;&gt; &gt;&gt; stream with any kind of QoS.<=
br />&gt; &gt;&gt; &gt; IIUC most commercial home routers/APs bet on offloa=
d engines to do<br />&gt; most of the heavy lifting, but as far as I unders=
tand only the NSS cores have a<br />&gt; shaper and fq_codel module....<br =
/>&gt; &gt;&gt; &gt;<br />&gt; &gt;&gt; &gt;<br />&gt; &gt;&gt; &gt;&gt; It=
 doesn't help that all the local ISP's claim 10Mbit upload<br />&gt; even w=
ith<br />&gt; &gt;&gt; &gt;&gt; 1G download.<br />&gt; &gt;&gt; &gt;&gt; Is=
 this a head end provisioning problem or related to Docsis 3.0<br />&gt; (o=
r<br />&gt; &gt;&gt; &gt;&gt; later) modems?<br />&gt; &gt;&gt; &gt; For DO=
CSIS the issue seems to be an unfortunate frequency split<br />&gt; between=
 up and downstream and use of lower efficiency coding schemes .<br />&gt; &=
gt;&gt; &gt; Over here the incumbent cable isp provisions fifty Mbps for<br=
 />&gt; upstream and plans to increase that to hundred once the upstream is=
 switched to<br />&gt; docsis 3.1.<br />&gt; &gt;&gt; &gt; I believe one is=
sue is that since most of the upstream is required<br />&gt; for the revers=
e ACK traffic for the download and hence it can not be<br />&gt; oversubscr=
ibed too much.... but I think we have real docsis experts on the list,<br /=
>&gt; so I will stop my speculation here...<br />&gt; &gt;&gt; &gt;<br />&g=
t; &gt;&gt; &gt; Regards<br />&gt; &gt;&gt; &gt; Sebastian<br />&gt; &gt;&g=
t; &gt;<br />&gt; &gt;&gt; &gt;<br />&gt; &gt;&gt; &gt;<br />&gt; &gt;&gt; =
&gt;<br />&gt; &gt;&gt; &gt;&gt; __________________________________________=
_____<br />&gt; &gt;&gt; &gt;&gt; Bloat mailing list<br />&gt; &gt;&gt; &gt=
;&gt; Bloat@lists.bufferbloat.net<br />&gt; &gt;&gt; &gt;&gt; https://lists=
.bufferbloat.net/listinfo/bloat<br />&gt; &gt;&gt; ________________________=
_______________________<br />&gt; &gt;&gt; Bloat mailing list<br />&gt; &gt=
;&gt; Bloat@lists.bufferbloat.net<br />&gt; &gt;&gt; https://lists.bufferbl=
oat.net/listinfo/bloat<br />&gt; &gt;<br />&gt; &gt; ______________________=
_________________________<br />&gt; &gt; Bloat mailing list<br />&gt; &gt; =
Bloat@lists.bufferbloat.net<br />&gt; &gt; https://lists.bufferbloat.net/li=
stinfo/bloat<br />&gt; <br />&gt; <br />&gt; <br />&gt; --<br />&gt; Latest=
 Podcast:<br />&gt; https://www.linkedin.com/feed/update/urn:li:activity:67=
91014284936785920/<br />&gt; <br />&gt; Dave T=C3=A4ht CTO, TekLibre, LLC<b=
r />&gt; _______________________________________________<br />&gt; Cake mai=
ling list<br />&gt; Cake@lists.bufferbloat.net<br />&gt; https://lists.buff=
erbloat.net/listinfo/cake<br />&gt; </p>=0A</div></font>
------=_20210708155625000000_68504--