From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp96.iad3a.emailsrvr.com (smtp96.iad3a.emailsrvr.com [173.203.187.96]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.bufferbloat.net (Postfix) with ESMTPS id F28513CB35 for ; Sat, 18 May 2019 18:36:46 -0400 (EDT) Received: from smtp13.relay.iad3a.emailsrvr.com (localhost [127.0.0.1]) by smtp13.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id AE31F1935; Sat, 18 May 2019 18:36:46 -0400 (EDT) X-SMTPDoctor-Processed: csmtpprox beta Received: from smtp13.relay.iad3a.emailsrvr.com (localhost [127.0.0.1]) by smtp13.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id A531238EB; Sat, 18 May 2019 18:36:46 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=g001.emailsrvr.com; s=20190322-9u7zjiwi; t=1558219006; bh=T7UETfZRepCN2tWlhcc3oO+roGLDpNl7JH8+6G0quXY=; h=Date:Subject:From:To:From; b=iLiwGaXwzQ+9uc9i/ac/tBuvTNlmDZvj2Bai7fsPyWBzWrUun461EzgjUDcbgbwf0 NaHlgdROghLxlOIK7S4tWOVeZUsUmtA3crq5QO0XKFy8XD/6OXUNZEhO4rCSVZQGCB tGuRbnmmlyzGKKC3PpKjNDpQq2hXeCkvWLSCL+sk= Received: from app13.wa-webapps.iad3a (relay-webapps.rsapps.net [172.27.255.140]) by smtp13.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id 5C5531935; Sat, 18 May 2019 18:36:46 -0400 (EDT) X-Sender-Id: dpreed@deepplum.com Received: from app13.wa-webapps.iad3a (relay-webapps.rsapps.net [172.27.255.140]) by 0.0.0.0:25 (trex/5.7.12); Sat, 18 May 2019 18:36:46 -0400 Received: from deepplum.com (localhost.localdomain [127.0.0.1]) by app13.wa-webapps.iad3a (Postfix) with ESMTP id 477ACA004C; Sat, 18 May 2019 18:36:46 -0400 (EDT) Received: by apps.rackspace.com (Authenticated sender: dpreed@deepplum.com, from: dpreed@deepplum.com) with HTTP; Sat, 18 May 2019 18:36:46 -0400 (EDT) X-Auth-ID: dpreed@deepplum.com Date: Sat, 18 May 2019 18:36:46 -0400 (EDT) From: "David P. Reed" To: "Jonathan Foulkes" Cc: "Dave Taht" , "Sebastian Moeller" , "Rich Brown" , "cerowrt-devel" , "bloat" MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_20190518183646000000_32055" Importance: Normal X-Priority: 3 (Normal) X-Type: html In-Reply-To: <2D23494F-7B4B-4426-AC57-8CFEA993D60B@jonathanfoulkes.com> References: <2936.1557856670@turing-police> <1557859131.759530583@apps.rackspace.com> <1557871532.754117608@apps.rackspace.com> <87lfz81x7b.fsf@toke.dk> <1557876841.69888745@apps.rackspace.com> <25460D05-4F53-4317-9722-2878B160BD7B@gmx.de> <2D23494F-7B4B-4426-AC57-8CFEA993D60B@jonathanfoulkes.com> Message-ID: <1558219006.29034759@apps.rackspace.com> X-Mailer: webmail/16.4.4-RC Subject: Re: [Cerowrt-devel] =?utf-8?q?=5BBloat=5D_=28no_subject=29?= X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.20 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 18 May 2019 22:36:47 -0000 ------=_20190518183646000000_32055 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable =0APardon, but cwnd should NEVER be larger than the number of forwarding ho= ps between source and destination.=0AKleinrock and students recently proved= that the optimum cwnd for both throughput and minimized latency is achieve= d when there is one packet or less in each outbound queue from source to de= stination (including cross traffic - meaning other flows sharing the same o= utbound queue.=0A =0ANow the idea that cwnd should be in the 1000's of pack= ets is totally absurd, unless the source or destination buffers (at the end= hosts) are counted, and that would be needed if the TCP source and destina= tion application might, for example, be "swapped out" and thus unable to ac= tually send and acknowledge packets at the instant of receiving an ACK.=0A = =0AIf cwnd is sort of compensating for "swapping out" the TCP endpoint proc= esses so that they take milliseconds to provide or acknowledge receipt of a= packet, then that's fine (if you want throughput and terrible latency), bu= t that's not the congestion window. That's just cramming the operating syst= em's scheduling delay into the TCP stack.=0A =0ATCP is not supposed to be d= esigned around slow OS process schedulers. Those buffers should never be al= lowed to build up in the transport network, where they kill latency for eve= ryone. That's just terrible design, conflating OS scheduling with congestio= n management.=0A =0A =0AOn Thursday, May 16, 2019 6:01pm, "Jonathan Foulkes= " said:=0A=0A=0A=0A> Thanks for sharing Dave.=0A> = =0A> A good paper, but there are few gaps worthy of mentioning on this list= :=0A> =0A> Testing when there is an AQM present means the test must adapt t= o the challenge of=0A> smaller cwnd existing for any one stream, therefore = it will take many more streams=0A> to saturate a line with cwnd =3D 30 than= if the cwnd is allowed to grow to=0A> >1,000=0A> In general, the impact of= cwnd relative to saturation and impact on delay was not=0A> visited, and y= et it=E2=80=99s critical. One of the reasons for spiky delays on high=0A> s= peed lines is the ginormous cwnds hogging the line with their 800ms+ RTT=E2= =80=99s=0A> =0A> Asymmetry of provisioned upload relative to download, at s= ome point, the=0A> ack-stream can be held up by either lack of capacity or = bloat in the uplink. So=0A> even though a link can deliver 300Mbps down, a = bloated uplink of 5mbps might never=0A> allow that level to be reached.=0A>= There are ISPs provisioning truly crazy asymmetric service.=0A> =0A> They = do make a good point about the local network, WiFi specifically being the n= ew=0A> bottleneck, which is why we included an iperf instance that can be s= tarted on the=0A> IQrouter to help run client to server tests that help spo= t local network capacity=0A> limits, typically on WiFi.=0A> =0A> Regarding = their point about =E2=80=98Cross traffic=E2=80=99 impact on measurements,= =0A> Cake=E2=80=99s per-host / per-target fairness also complicates AQM-ena= bled testing=0A> from client devices. Which is why we make the built-in spe= ed test the arbiter of=0A> true line capacity, as it factors for ALL traffi= c flowing through the router. But,=0A> as you mention, that is also a chall= enge from a CPU resource standpoint on higher=0A> speeds.=0A> =0A> The bigg= est gap in this paper is not paying sufficient attention to latency as a=0A= > critical metric, and one that is controllable by an AQM. Bufferbloat metr= ics have=0A> more impact on end-user experience than +/- 50Mbps on a 100mbp= s baseline.=0A> I was rather miffed they do not even mention the DSLreports= .com speedtest, or the=0A> fast.com test, as those are the two that provide= a bufferbloat metric.=0A> =0A> The industry as whole MUST pay attention an= d socialize the relevancy of managed=0A> latencies as being critical to cus= tomer satisfaction and good application=0A> performance. And that starts wi= th tests that clearly grade that critical aspect.=0A> =0A> Cheers,=0A> =0A>= Jonathan=0A> =0A> > On May 15, 2019, at 3:58 AM, Dave Taht wrote:=0A> >=0A> > If it helps any: Nick Feamster and Jason Livingoo= d just published "=0A> > Internet Speed Measurement: Current Challenges and= Future=0A> > Recommendations " ( https://arxiv.org/pdf/1905.02334.pdf ) a = few days=0A> > ago, and outlines quite a few problems going forward at high= er speeds.=0A> > I do wish the document had pointed out more clearly that r= outer based=0A> > measurements have problems also, with weaker cpus unable = to source=0A> > enough traffic for an accurate measurement, but I do hope t= his=0A> > document has impact, and it's a good read, regardless.=0A> >=0A> = > Still, somehow getting it right at lower speeds is always on my mind.=0A>= > I'd long ago hoped that DSL devices would adopt BQL, and that=0A> > cabl= emodems would also, thus moving packet processing a little higher=0A> > on = the stack so more advanced algorithms like cake could take hold.=0A> >=0A> = > On Wed, May 15, 2019 at 9:32 AM Sebastian Moeller =0A> w= rote:=0A> >>=0A> >> Hi All,=0A> >>=0A> >>=0A> >> I believe the following to= be relevant to this discussion:=0A> https://apenwarr.ca/log/20180808=0A> >= > Where he discusses a similar idea including implementation albeit aimed= =0A> at lower bandwidth and sans the automatic bandwidth tracking.=0A> >>= =0A> >>=0A> >>> On May 15, 2019, at 01:34, David P. Reed =0A> wrote:=0A> >>>=0A> >>>=0A> >>> Ideally, it would need to be self-co= nfiguring, though... I.e.,=0A> something=0A> >>> like the IQRouter auto-mea= suring of the upstream bandwidth to tune=0A> the=0A> >>> shaper.=0A> >>=0A>= >> @Jonathan from your experience how tricky is it to get reliable speedte= st=0A> endpoints and how reliable are they in practice. And do you do any s= anitization,=0A> like take another measure immediate if the measured rate d= iffers from the last by=0A> more than XX% or something like that?=0A> >>=0A= > >>=0A> >>>=0A> >>> Sure, seems like this is easy to code because there ar= e exactly two=0A> ports to measure, they can even be labeled physically "up= " and "down" to indicate=0A> their function.=0A> >>=0A> >> IMHO the real ch= allenge is automated measurements over the internet at=0A> Gbps speeds. It = is not hard to get some test going (by e.g. tapping into ookla's=0A> fast n= et of confederated measurement endpoints) but getting something where the= =0A> servers can reliably saturate 1Gbps+ seems somewhat trickier (last tim= e I looked=0A> one required a 1Gbps connection to the server to participate= in speedtest.net,=0A> obviously not really suited for measuring Gbps speed= s).=0A> >> In the EU there exists a mandate for national regulators to esta= blish=0A> and/or endorse an anointed "official" speedtests, untended to kee= p ISP marketing=0A> honest, that come with stricter guarantees (e.g. the of= ficial German speedtest,=0A> breitbandmessung.de will only admit tests if t= he servers are having sufficient=0A> bandwidth reserves to actually saturat= e the link; the enduser is required to=0A> select the speed-tier giving the= m a strong hint about the required rates I=0A> believe).=0A> >> For my back= -burner toy project "per-packet-overhead estimation on=0A> arbitrary link t= echnology" I am currently facing the same problem, I need a=0A> traffic sin= k and source that can reliably saturate my link so I can measure=0A> maximu= m achievable goodput, so if anybody in the list has ideas, I am all=0A> ear= s/eyes.=0A> >>=0A> >>>=0A> >>> For reference, the GL.iNet routers are tiny = and nicely packaged, and=0A> run=0A> >>> OpenWrt; they do have one with Gbi= t ports[0], priced around $70. I=0A> very=0A> >>> much doubt it can actuall= y push a gigabit, though, but I haven't had=0A> a=0A> >>> chance to test it= . However, losing the WiFi, and getting a slightly=0A> >>> beefier SoC in t= here will probably be doable without the price going=0A> >>> over $100, no?= =0A> >>>=0A> >>> I assume the WiFi silicon is probably the most costly piec= e of=0A> intellectual property in the system. So yeah. Maybe with the right= parts being=0A> available, one could aim at $50 or less, without sales cha= nnel markup. (Raspberry=0A> Pi ARM64 boards don't have GigE, and I think th= at might be because the GigE=0A> interfaces are a bit pricey. However, the = ARM64 SoC's available are typically=0A> Celeron-class multicore systems. I = don't know why there aren't more ARM64 systems=0A> on a chip with dual GigE= , but I suspect searching for them would turn up some).=0A> >>=0A> >> The t= urris MOX (https://www.turris.cz/en/specification/) might be a=0A> decent s= tartimg point as it comes with one Gbethernet port and both a SGMII and a= =0A> PCIe signals routed on a connector, they also have a 4 and an 8 port s= witch=0A> module, but for our purposes it might be possible to just create = a small single Gb=0A> ethernet port board to get started.=0A> >>=0A> >> Bes= t Regards=0A> >> Sebastian=0A> >>=0A> >>>=0A> >>> -Toke=0A> >>>=0A> >>> [0]= https://www.gl-inet.com/products/gl-ar750s/=0A> >>> ______________________= _________________________=0A> >>> Cerowrt-devel mailing list=0A> >>> Cerowr= t-devel@lists.bufferbloat.net=0A> >>> https://lists.bufferbloat.net/listinf= o/cerowrt-devel=0A> >>=0A> >> _____________________________________________= __=0A> >> Bloat mailing list=0A> >> Bloat@lists.bufferbloat.net=0A> >> http= s://lists.bufferbloat.net/listinfo/bloat=0A> >=0A> >=0A> >=0A> > --=0A> >= =0A> > Dave T=C3=A4ht=0A> > CTO, TekLibre, LLC=0A> > http://www.teklibre.co= m=0A> > Tel: 1-831-205-9740=0A> > _________________________________________= ______=0A> > Bloat mailing list=0A> > Bloat@lists.bufferbloat.net=0A> > htt= ps://lists.bufferbloat.net/listinfo/bloat=0A> =0A> ------=_20190518183646000000_32055 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

Pardon, but cwnd shoul= d NEVER be larger than the number of forwarding hops between source and des= tination.

=0A

Kleinrock and students recently proved= that the optimum cwnd for both throughput and minimized latency is achieve= d when there is one packet or less in each outbound queue from source to de= stination (including cross traffic - meaning other flows sharing the same o= utbound queue.

=0A

 

=0A

Now the idea that cwnd should be in the 1000's of packets is totally absur= d, unless the source or destination buffers (at the end hosts) are counted,= and that would be needed if the TCP source and destination application mig= ht, for example, be "swapped out" and thus unable to actually send and ackn= owledge packets at the instant of receiving an ACK.

=0A

 

=0A

If cwnd is sort of compensating for "= swapping out" the TCP endpoint processes so that they take milliseconds to = provide or acknowledge receipt of a packet, then that's fine (if you want t= hroughput and terrible latency), but that's not the congestion window. That= 's just cramming the operating system's scheduling delay into the TCP stack= .

=0A

 

=0A

TCP is not s= upposed to be designed around slow OS process schedulers. Those buffers sho= uld never be allowed to build up in the transport network, where they kill = latency for everyone. That's just terrible design, conflating OS scheduling= with congestion management.

=0A

 

=0A

 

=0A

On Thursday, May 16, 2019 = 6:01pm, "Jonathan Foulkes" <jf@jonathanfoulkes.com> said:

=

=0A
=0A

> Thanks= for sharing Dave.
>
> A good paper, but there are few gap= s worthy of mentioning on this list:
>
> Testing when ther= e is an AQM present means the test must adapt to the challenge of
>= smaller cwnd existing for any one stream, therefore it will take many more= streams
> to saturate a line with cwnd =3D 30 than if the cwnd is = allowed to grow to
> >1,000
> In general, the impact of = cwnd relative to saturation and impact on delay was not
> visited, = and yet it=E2=80=99s critical. One of the reasons for spiky delays on high<= br />> speed lines is the ginormous cwnds hogging the line with their 80= 0ms+ RTT=E2=80=99s
>
> Asymmetry of provisioned upload rel= ative to download, at some point, the
> ack-stream can be held up b= y either lack of capacity or bloat in the uplink. So
> even though = a link can deliver 300Mbps down, a bloated uplink of 5mbps might never
> allow that level to be reached.
> There are ISPs provisioning= truly crazy asymmetric service.
>
> They do make a good p= oint about the local network, WiFi specifically being the new
> bot= tleneck, which is why we included an iperf instance that can be started on = the
> IQrouter to help run client to server tests that help spot lo= cal network capacity
> limits, typically on WiFi.
>
&= gt; Regarding their point about =E2=80=98Cross traffic=E2=80=99 impact on m= easurements,
> Cake=E2=80=99s per-host / per-target fairness also c= omplicates AQM-enabled testing
> from client devices. Which is why = we make the built-in speed test the arbiter of
> true line capacity= , as it factors for ALL traffic flowing through the router. But,
> = as you mention, that is also a challenge from a CPU resource standpoint on = higher
> speeds.
>
> The biggest gap in this paper= is not paying sufficient attention to latency as a
> critical metr= ic, and one that is controllable by an AQM. Bufferbloat metrics have
&= gt; more impact on end-user experience than +/- 50Mbps on a 100mbps baselin= e.
> I was rather miffed they do not even mention the DSLreports.co= m speedtest, or the
> fast.com test, as those are the two that prov= ide a bufferbloat metric.
>
> The industry as whole MUST p= ay attention and socialize the relevancy of managed
> latencies as = being critical to customer satisfaction and good application
> perf= ormance. And that starts with tests that clearly grade that critical aspect= .
>
> Cheers,
>
> Jonathan
>
> > On May 15, 2019, at 3:58 AM, Dave Taht <dave.taht@gmail.com&= gt; wrote:
> >
> > If it helps any: Nick Feamster and= Jason Livingood just published "
> > Internet Speed Measurement= : Current Challenges and Future
> > Recommendations " ( https://= arxiv.org/pdf/1905.02334.pdf ) a few days
> > ago, and outlines = quite a few problems going forward at higher speeds.
> > I do wi= sh the document had pointed out more clearly that router based
> &g= t; measurements have problems also, with weaker cpus unable to source
= > > enough traffic for an accurate measurement, but I do hope this> > document has impact, and it's a good read, regardless.
&g= t; >
> > Still, somehow getting it right at lower speeds is a= lways on my mind.
> > I'd long ago hoped that DSL devices would = adopt BQL, and that
> > cablemodems would also, thus moving pack= et processing a little higher
> > on the stack so more advanced = algorithms like cake could take hold.
> >
> > On Wed,= May 15, 2019 at 9:32 AM Sebastian Moeller <moeller0@gmx.de>
>= ; wrote:
> >>
> >> Hi All,
> >>> >>
> >> I believe the following to be relevant= to this discussion:
> https://apenwarr.ca/log/20180808
> &= gt;> Where he discusses a similar idea including implementation albeit a= imed
> at lower bandwidth and sans the automatic bandwidth tracking= .
> >>
> >>
> >>> On May 15, 2= 019, at 01:34, David P. Reed <dpreed@deepplum.com>
> wrote:> >>>
> >>>
> >>> Ideally= , it would need to be self-configuring, though... I.e.,
> something=
> >>> like the IQRouter auto-measuring of the upstream ba= ndwidth to tune
> the
> >>> shaper.
> >= >
> >> @Jonathan from your experience how tricky is it to = get reliable speedtest
> endpoints and how reliable are they in pra= ctice. And do you do any sanitization,
> like take another measure = immediate if the measured rate differs from the last by
> more than= XX% or something like that?
> >>
> >>
>= ; >>>
> >>> Sure, seems like this is easy to code= because there are exactly two
> ports to measure, they can even be= labeled physically "up" and "down" to indicate
> their function.> >>
> >> IMHO the real challenge is automated m= easurements over the internet at
> Gbps speeds. It is not hard to g= et some test going (by e.g. tapping into ookla's
> fast net of conf= ederated measurement endpoints) but getting something where the
> s= ervers can reliably saturate 1Gbps+ seems somewhat trickier (last time I lo= oked
> one required a 1Gbps connection to the server to participate= in speedtest.net,
> obviously not really suited for measuring Gbps= speeds).
> >> In the EU there exists a mandate for national = regulators to establish
> and/or endorse an anointed "official" spe= edtests, untended to keep ISP marketing
> honest, that come with st= ricter guarantees (e.g. the official German speedtest,
> breitbandm= essung.de will only admit tests if the servers are having sufficient
&= gt; bandwidth reserves to actually saturate the link; the enduser is requir= ed to
> select the speed-tier giving them a strong hint about the r= equired rates I
> believe).
> >> For my back-burner t= oy project "per-packet-overhead estimation on
> arbitrary link tech= nology" I am currently facing the same problem, I need a
> traffic = sink and source that can reliably saturate my link so I can measure
&g= t; maximum achievable goodput, so if anybody in the list has ideas, I am al= l
> ears/eyes.
> >>
> >>>
> = >>> For reference, the GL.iNet routers are tiny and nicely package= d, and
> run
> >>> OpenWrt; they do have one with = Gbit ports[0], priced around $70. I
> very
> >>> m= uch doubt it can actually push a gigabit, though, but I haven't had
&g= t; a
> >>> chance to test it. However, losing the WiFi, an= d getting a slightly
> >>> beefier SoC in there will proba= bly be doable without the price going
> >>> over $100, no?=
> >>>
> >>> I assume the WiFi silicon is= probably the most costly piece of
> intellectual property in the s= ystem. So yeah. Maybe with the right parts being
> available, one c= ould aim at $50 or less, without sales channel markup. (Raspberry
>= Pi ARM64 boards don't have GigE, and I think that might be because the Gig= E
> interfaces are a bit pricey. However, the ARM64 SoC's available= are typically
> Celeron-class multicore systems. I don't know why = there aren't more ARM64 systems
> on a chip with dual GigE, but I s= uspect searching for them would turn up some).
> >>
>= >> The turris MOX (https://www.turris.cz/en/specification/) might be= a
> decent startimg point as it comes with one Gbethernet port and= both a SGMII and a
> PCIe signals routed on a connector, they also= have a 4 and an 8 port switch
> module, but for our purposes it mi= ght be possible to just create a small single Gb
> ethernet port bo= ard to get started.
> >>
> >> Best Regards
> >> Sebastian
> >>
> >>>
>= ; >>> -Toke
> >>>
> >>> [0] http= s://www.gl-inet.com/products/gl-ar750s/
> >>> ____________= ___________________________________
> >>> Cerowrt-devel ma= iling list
> >>> Cerowrt-devel@lists.bufferbloat.net
= > >>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
> >>
> >> _________________________________________= ______
> >> Bloat mailing list
> >> Bloat@lists= .bufferbloat.net
> >> https://lists.bufferbloat.net/listinfo/= bloat
> >
> >
> >
> > --
&= gt; >
> > Dave T=C3=A4ht
> > CTO, TekLibre, LLC> > http://www.teklibre.com
> > Tel: 1-831-205-9740
> > _______________________________________________
> > = Bloat mailing list
> > Bloat@lists.bufferbloat.net
> >= ; https://lists.bufferbloat.net/listinfo/bloat
>
>

=0A=
------=_20190518183646000000_32055--