From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp73.iad3a.emailsrvr.com (smtp73.iad3a.emailsrvr.com [173.203.187.73]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by huchra.bufferbloat.net (Postfix) with ESMTPS id 5622E21F266 for ; Sun, 25 May 2014 13:00:54 -0700 (PDT) Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp10.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id CCA7DE807F; Sun, 25 May 2014 16:00:52 -0400 (EDT) X-Virus-Scanned: OK Received: from app47.wa-webapps.iad3a (relay.iad3a.rsapps.net [172.27.255.110]) by smtp10.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id 9CDC1E8076; Sun, 25 May 2014 16:00:52 -0400 (EDT) Received: from reed.com (localhost.localdomain [127.0.0.1]) by app47.wa-webapps.iad3a (Postfix) with ESMTP id A2A113800BF; Sun, 25 May 2014 16:00:53 -0400 (EDT) Received: by apps.rackspace.com (Authenticated sender: dpreed@reed.com, from: dpreed@reed.com) with HTTP; Sun, 25 May 2014 16:00:53 -0400 (EDT) Date: Sun, 25 May 2014 16:00:53 -0400 (EDT) From: dpreed@reed.com To: "Mikael Abrahamsson" MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_20140525160053000000_88227" Importance: Normal X-Priority: 3 (Normal) X-Type: html In-Reply-To: References: Message-ID: <1401048053.664331760@apps.rackspace.com> X-Mailer: webmail7.0 Cc: cerowrt-devel@lists.bufferbloat.net Subject: Re: [Cerowrt-devel] Ubiquiti QOS X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 25 May 2014 20:00:54 -0000 ------=_20140525160053000000_88227 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable =0ANot that it is directly relevant, but there is no essential reason to re= quire 50 ms. of buffering. That might be true of some particular QOS-relat= ed router algorithm. 50 ms. is about all one can tolerate in any router be= tween source and destination for today's networks - an upper-bound rather t= han a minimum.=0A =0AThe optimum buffer state for throughput is 1-2 packets= worth - in other words, if we have an MTU of 1500, 1500 - 3000 bytes. Only= the bottleneck buffer (the input queue to the lowest speed link along the = path) should have this much actually buffered. Buffering more than this inc= reases end-to-end latency beyond its optimal state. Increased end-to-end l= atency reduces the effectiveness of control loops, creating more congestion= .=0A =0AThe rationale for having 50 ms. of buffering is probably to avoid d= isruption of bursty mixed flows where the bursts might persist for 50 ms. a= nd then die. One reason for this is that source nodes run operating systems= that tend to release packets in bursts. That's a whole other discussion - = in an ideal world, source nodes would avoid bursty packet releases by letti= ng the control by the receiver window be "tight" timing-wise. That is, to = transmit a packet immediately at the instant an ACK arrives increasing the = window. This would pace the flow - current OS's tend (due to scheduling mi= smatches) to send bursts of packets, "catching up" on sending that could ha= ve been spaced out and done earlier if the feedback from the receiver's win= dow advancing were heeded.=0A =0AThat is, endpoint network stacks (TCP impl= ementations) can worsen congestion by "dallying". The ideal end-to-end flo= ws occupying a congested router would have their packets paced so that the = packets end up being sent in the least bursty manner that an application ca= n support. The effect of this pacing is to move the "backlog" for each flo= w quickly into the source node for that flow, which then provides back pres= sure on the application driving the flow, which ultimately is necessary to = stanch congestion. The ideal congestion control mechanism slows the sender= part of the application to a pace that can go through the network without = contributing to buffering.=0A =0ACurrent network stacks (including Linux's)= don't achieve that goal - their pushback on application sources is minimal= - instead they accumulate buffering internal to the network implementation= . This contributes to end-to-end latency as well. But if you think about = it, this is almost as bad as switch-level bufferbloat in terms of degrading= user experience. The reason I say "almost" is that there are tools, rarel= y used in practice, that allow an application to specify that buffering sho= uld not build up in the network stack (in the kernel or wherever it is). B= ut the default is not to use those APIs, and to buffer way too much.=0A =0A= Remember, the network send stack can act similarly to a congested switch (i= t is a switch among all the user applications running on that node). IF th= ere is a heavy file transfer, the file transfer's buffering acts to increas= e latency for all other networked communications on that machine.=0A =0ATra= ditionally this problem has been thought of only as a within-node fairness = issue, but in fact it has a big effect on the switches in between source an= d destination due to the lack of dispersed pacing of the packets at the sou= rce - in other words, the current design does nothing to stem the "burst gr= oups" from a single source mentioned above.=0A =0ASo we do need the source = nodes to implement less "bursty" sending stacks. This is especially true f= or multiplexed source nodes, such as web servers implementing thousands of = flows.=0A =0AA combination of codel-style switch-level buffer management an= d the stack at the sender being implemented to spread packets in a particul= ar TCP flow out over time would improve things a lot. To achieve best thro= ughput, the optimal way to spread packets out on an end-to-end basis is to = update the receive window (sending ACK) at the receive end as quickly as po= ssible, and to respond to the updated receive window as quickly as possible= when it increases.=0A =0AJust like the "bufferbloat" issue, the problem is= caused by applications like streaming video, file transfers and big web pa= ges that the application programmer sees as not having a latency requiremen= t within the flow, so the application programmer does not have an incentive= to control pacing. Thus the operating system has got to push back on the = applications' flow somehow, so that the flow ends up paced once it enters t= he Internet itself. So there's no real problem caused by large buffering i= n the network stack at the endpoint, as long as the stack's delivery to the= Internet is paced by some mechanism, e.g. tight management of receive wind= ow control on an end-to-end basis.=0A =0AI don't think this can be fixed by= cerowrt, so this is out of place here. It's partially ameliorated by cero= wrt, if it aggressively drops packets from flows that burst without pacing.= fq_codel does this, if the buffer size it aims for is small - but the prob= lem is that the OS stacks don't respond by pacing... they tend to respond b= y bursting, not because TCP doesn't provide the mechanisms for pacing, but = because the OS stack doesn't transmit as soon as it is allowed to - thus bu= ilding up a burst unnecessarily.=0A =0ABursts on a flow are thus bad in gen= eral. They make congestion happen when it need not.=0A =0A =0A=0A=0AOn Sun= day, May 25, 2014 11:42am, "Mikael Abrahamsson" said:=0A= =0A=0A=0A> On Sun, 25 May 2014, Dane Medic wrote:=0A> =0A> > Is it true tha= t devices with less than 64 MB can't handle QOS? ->=0A> > https://lists.cha= mbana.net/pipermail/commotion-dev/2014-May/001816.html=0A> =0A> At gig spee= ds you need around 50ms worth of buffering. 1 gigabit/s =3D=0A> 125 megabyt= e/s meaning for 50ms you need 6.25 megabyte of buffer.=0A> =0A> I also don'= t see why performance and memory size would be relevant, I'd=0A> say forwar= ding performance has more to do with CPU speed than anything=0A> else.=0A> = =0A> --=0A> Mikael Abrahamsson email: swmike@swm.pp.se=0A> _____________= __________________________________=0A> Cerowrt-devel mailing list=0A> Cerow= rt-devel@lists.bufferbloat.net=0A> https://lists.bufferbloat.net/listinfo/c= erowrt-devel=0A> ------=_20140525160053000000_88227 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

Not that i= t is directly relevant, but there is no essential reason to require 50 ms. = of buffering.  That might be true of some particular QOS-related route= r algorithm.  50 ms. is about all one can tolerate in any router betwe= en source and destination for today's networks - an upper-bound rather than= a minimum.

=0A

 

=0A

The optimum buffer state for throughput is 1-2 packets worth= - in other words, if we have an MTU of 1500, 1500 - 3000 bytes. Only the b= ottleneck buffer (the input queue to the lowest speed link along the path) = should have this much actually buffered. Buffering more than this increases= end-to-end latency beyond its optimal state.  Increased end-to-end la= tency reduces the effectiveness of control loops, creating more congestion.=

=0A

 

=0A

The rationale for having 50 ms. of buffering is probably = to avoid disruption of bursty mixed flows where the bursts might persist fo= r 50 ms. and then die. One reason for this is that source nodes run operati= ng systems that tend to release packets in bursts. That's a whole other dis= cussion - in an ideal world, source nodes would avoid bursty packet release= s by letting the control by the receiver window be "tight" timing-wise. &nb= sp;That is, to transmit a packet immediately at the instant an ACK arrives = increasing the window.  This would pace the flow - current OS's tend (= due to scheduling mismatches) to send bursts of packets, "catching up" on s= ending that could have been spaced out and done earlier if the feedback fro= m the receiver's window advancing were heeded.

=0A

 

=0A

That is, endpoint = network stacks (TCP implementations) can worsen congestion by "dallying". &= nbsp;The ideal end-to-end flows occupying a congested router would have the= ir packets paced so that the packets end up being sent in the least bursty = manner that an application can support.  The effect of this pacing is = to move the "backlog" for each flow quickly into the source node for that f= low, which then provides back pressure on the application driving the flow,= which ultimately is necessary to stanch congestion.  The ideal conges= tion control mechanism slows the sender part of the application to a pace t= hat can go through the network without contributing to buffering.

=0A

 

=0A

Current network stacks (including Linux's) don't achieve that goal - their= pushback on application sources is minimal - instead they accumulate buffe= ring internal to the network implementation.  This contributes to end-= to-end latency as well.  But if you think about it, this is almost as = bad as switch-level bufferbloat in terms of degrading user experience. &nbs= p;The reason I say "almost" is that there are tools, rarely used in practic= e, that allow an application to specify that buffering should not build up = in the network stack (in the kernel or wherever it is).  But the defau= lt is not to use those APIs, and to buffer way too much.

=0A

 

=0A

Remember= , the network send stack can act similarly to a congested switch (it is a s= witch among all the user applications running on that node).  IF there= is a heavy file transfer, the file transfer's buffering acts to increase l= atency for all other networked communications on that machine.

=0A

 

=0A

Tr= aditionally this problem has been thought of only as a within-node fairness= issue, but in fact it has a big effect on the switches in between source a= nd destination due to the lack of dispersed pacing of the packets at the so= urce - in other words, the current design does nothing to stem the "burst g= roups" from a single source mentioned above.

=0A

 

=0A

So we do need the so= urce nodes to implement less "bursty" sending stacks.  This is especia= lly true for multiplexed source nodes, such as web servers implementing tho= usands of flows.

=0A

 

=0A

A combination of codel-style switch-level buffer= management and the stack at the sender being implemented to spread packets= in a particular TCP flow out over time would improve things a lot.  T= o achieve best throughput, the optimal way to spread packets out on an end-= to-end basis is to update the receive window (sending ACK) at the receive e= nd as quickly as possible, and to respond to the updated receive window as = quickly as possible when it increases.

=0A

 

=0A

Just like the "bufferbloat= " issue, the problem is caused by applications like streaming video, file t= ransfers and big web pages that the application programmer sees as not havi= ng a latency requirement within the flow, so the application programmer doe= s not have an incentive to control pacing.  Thus the operating system = has got to push back on the applications' flow somehow, so that the flow en= ds up paced once it enters the Internet itself.  So there's no real pr= oblem caused by large buffering in the network stack at the endpoint, as lo= ng as the stack's delivery to the Internet is paced by some mechanism, e.g.= tight management of receive window control on an end-to-end basis.

=0A<= p style=3D"margin:0;padding:0;"> 

=0A

I don't think this can be fixed by cerowrt, so this is out of place here= .  It's partially ameliorated by cerowrt, if it aggressively drops pac= kets from flows that burst without pacing. fq_codel does this, if the buffe= r size it aims for is small - but the problem is that the OS stacks don't r= espond by pacing... they tend to respond by bursting, not because TCP doesn= 't provide the mechanisms for pacing, but because the OS stack doesn't tran= smit as soon as it is allowed to - thus building up a burst unnecessarily.<= /p>=0A

 

=0A

Bursts on a flow are thus bad in general.  They make congest= ion happen when it need not.

=0A

 <= /p>=0A

 

=0A





On Sunday, May 25, 2014 11:42am, "Mikael Abr= ahamsson" <swmike@swm.pp.se> said:

=0A
=0A

> On Sun, 25 May 20= 14, Dane Medic wrote:
>
> > Is it true that devices wit= h less than 64 MB can't handle QOS? ->
> > https://lists.cham= bana.net/pipermail/commotion-dev/2014-May/001816.html
>
> = At gig speeds you need around 50ms worth of buffering. 1 gigabit/s =3D
> 125 megabyte/s meaning for 50ms you need 6.25 megabyte of buffer.
>
> I also don't see why performance and memory size would be= relevant, I'd
> say forwarding performance has more to do with CPU= speed than anything
> else.
>
> --
> Mika= el Abrahamsson email: swmike@swm.pp.se
> _______________________= ________________________
> Cerowrt-devel mailing list
> Cer= owrt-devel@lists.bufferbloat.net
> https://lists.bufferbloat.net/li= stinfo/cerowrt-devel
>

=0A
------=_20140525160053000000_88227--