From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dpreed@reed.com>
Received: from smtp73.iad3a.emailsrvr.com (smtp73.iad3a.emailsrvr.com
	[173.203.187.73])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client did not present a certificate)
	by huchra.bufferbloat.net (Postfix) with ESMTPS id 5622E21F266
	for <cerowrt-devel@lists.bufferbloat.net>;
	Sun, 25 May 2014 13:00:54 -0700 (PDT)
Received: from localhost (localhost.localdomain [127.0.0.1])
	by smtp10.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id
	CCA7DE807F; Sun, 25 May 2014 16:00:52 -0400 (EDT)
X-Virus-Scanned: OK
Received: from app47.wa-webapps.iad3a (relay.iad3a.rsapps.net [172.27.255.110])
	by smtp10.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id
	9CDC1E8076; Sun, 25 May 2014 16:00:52 -0400 (EDT)
Received: from reed.com (localhost.localdomain [127.0.0.1])
	by app47.wa-webapps.iad3a (Postfix) with ESMTP id A2A113800BF;
	Sun, 25 May 2014 16:00:53 -0400 (EDT)
Received: by apps.rackspace.com
	(Authenticated sender: dpreed@reed.com, from: dpreed@reed.com) 
	with HTTP; Sun, 25 May 2014 16:00:53 -0400 (EDT)
Date: Sun, 25 May 2014 16:00:53 -0400 (EDT)
From: dpreed@reed.com
To: "Mikael Abrahamsson" <swmike@swm.pp.se>
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_20140525160053000000_88227"
Importance: Normal
X-Priority: 3 (Normal)
X-Type: html
In-Reply-To: <alpine.DEB.2.02.1405251740360.29282@uplift.swm.pp.se>
References: <CABsdH_FMqARQQ7oT2gGE6PEZWk1E6b6CDGdBH958nL2=FmFv-A@mail.gmail.com>
	<alpine.DEB.2.02.1405251740360.29282@uplift.swm.pp.se>
Message-ID: <1401048053.664331760@apps.rackspace.com>
X-Mailer: webmail7.0
Cc: cerowrt-devel@lists.bufferbloat.net
Subject: Re: [Cerowrt-devel] Ubiquiti QOS
X-BeenThere: cerowrt-devel@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: Development issues regarding the cerowrt test router project
	<cerowrt-devel.lists.bufferbloat.net>
List-Unsubscribe: <https://lists.bufferbloat.net/options/cerowrt-devel>,
	<mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=unsubscribe>
List-Archive: <https://lists.bufferbloat.net/pipermail/cerowrt-devel>
List-Post: <mailto:cerowrt-devel@lists.bufferbloat.net>
List-Help: <mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=help>
List-Subscribe: <https://lists.bufferbloat.net/listinfo/cerowrt-devel>,
	<mailto:cerowrt-devel-request@lists.bufferbloat.net?subject=subscribe>
X-List-Received-Date: Sun, 25 May 2014 20:00:54 -0000

------=_20140525160053000000_88227
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

=0ANot that it is directly relevant, but there is no essential reason to re=
quire 50 ms. of buffering.  That might be true of some particular QOS-relat=
ed router algorithm.  50 ms. is about all one can tolerate in any router be=
tween source and destination for today's networks - an upper-bound rather t=
han a minimum.=0A =0AThe optimum buffer state for throughput is 1-2 packets=
 worth - in other words, if we have an MTU of 1500, 1500 - 3000 bytes. Only=
 the bottleneck buffer (the input queue to the lowest speed link along the =
path) should have this much actually buffered. Buffering more than this inc=
reases end-to-end latency beyond its optimal state.  Increased end-to-end l=
atency reduces the effectiveness of control loops, creating more congestion=
.=0A =0AThe rationale for having 50 ms. of buffering is probably to avoid d=
isruption of bursty mixed flows where the bursts might persist for 50 ms. a=
nd then die. One reason for this is that source nodes run operating systems=
 that tend to release packets in bursts. That's a whole other discussion - =
in an ideal world, source nodes would avoid bursty packet releases by letti=
ng the control by the receiver window be "tight" timing-wise.  That is, to =
transmit a packet immediately at the instant an ACK arrives increasing the =
window.  This would pace the flow - current OS's tend (due to scheduling mi=
smatches) to send bursts of packets, "catching up" on sending that could ha=
ve been spaced out and done earlier if the feedback from the receiver's win=
dow advancing were heeded.=0A =0AThat is, endpoint network stacks (TCP impl=
ementations) can worsen congestion by "dallying".  The ideal end-to-end flo=
ws occupying a congested router would have their packets paced so that the =
packets end up being sent in the least bursty manner that an application ca=
n support.  The effect of this pacing is to move the "backlog" for each flo=
w quickly into the source node for that flow, which then provides back pres=
sure on the application driving the flow, which ultimately is necessary to =
stanch congestion.  The ideal congestion control mechanism slows the sender=
 part of the application to a pace that can go through the network without =
contributing to buffering.=0A =0ACurrent network stacks (including Linux's)=
 don't achieve that goal - their pushback on application sources is minimal=
 - instead they accumulate buffering internal to the network implementation=
.  This contributes to end-to-end latency as well.  But if you think about =
it, this is almost as bad as switch-level bufferbloat in terms of degrading=
 user experience.  The reason I say "almost" is that there are tools, rarel=
y used in practice, that allow an application to specify that buffering sho=
uld not build up in the network stack (in the kernel or wherever it is).  B=
ut the default is not to use those APIs, and to buffer way too much.=0A =0A=
Remember, the network send stack can act similarly to a congested switch (i=
t is a switch among all the user applications running on that node).  IF th=
ere is a heavy file transfer, the file transfer's buffering acts to increas=
e latency for all other networked communications on that machine.=0A =0ATra=
ditionally this problem has been thought of only as a within-node fairness =
issue, but in fact it has a big effect on the switches in between source an=
d destination due to the lack of dispersed pacing of the packets at the sou=
rce - in other words, the current design does nothing to stem the "burst gr=
oups" from a single source mentioned above.=0A =0ASo we do need the source =
nodes to implement less "bursty" sending stacks.  This is especially true f=
or multiplexed source nodes, such as web servers implementing thousands of =
flows.=0A =0AA combination of codel-style switch-level buffer management an=
d the stack at the sender being implemented to spread packets in a particul=
ar TCP flow out over time would improve things a lot.  To achieve best thro=
ughput, the optimal way to spread packets out on an end-to-end basis is to =
update the receive window (sending ACK) at the receive end as quickly as po=
ssible, and to respond to the updated receive window as quickly as possible=
 when it increases.=0A =0AJust like the "bufferbloat" issue, the problem is=
 caused by applications like streaming video, file transfers and big web pa=
ges that the application programmer sees as not having a latency requiremen=
t within the flow, so the application programmer does not have an incentive=
 to control pacing.  Thus the operating system has got to push back on the =
applications' flow somehow, so that the flow ends up paced once it enters t=
he Internet itself.  So there's no real problem caused by large buffering i=
n the network stack at the endpoint, as long as the stack's delivery to the=
 Internet is paced by some mechanism, e.g. tight management of receive wind=
ow control on an end-to-end basis.=0A =0AI don't think this can be fixed by=
 cerowrt, so this is out of place here.  It's partially ameliorated by cero=
wrt, if it aggressively drops packets from flows that burst without pacing.=
 fq_codel does this, if the buffer size it aims for is small - but the prob=
lem is that the OS stacks don't respond by pacing... they tend to respond b=
y bursting, not because TCP doesn't provide the mechanisms for pacing, but =
because the OS stack doesn't transmit as soon as it is allowed to - thus bu=
ilding up a burst unnecessarily.=0A =0ABursts on a flow are thus bad in gen=
eral.  They make congestion happen when it need not.=0A =0A =0A=0A=0AOn Sun=
day, May 25, 2014 11:42am, "Mikael Abrahamsson" <swmike@swm.pp.se> said:=0A=
=0A=0A=0A> On Sun, 25 May 2014, Dane Medic wrote:=0A> =0A> > Is it true tha=
t devices with less than 64 MB can't handle QOS? ->=0A> > https://lists.cha=
mbana.net/pipermail/commotion-dev/2014-May/001816.html=0A> =0A> At gig spee=
ds you need around 50ms worth of buffering. 1 gigabit/s =3D=0A> 125 megabyt=
e/s meaning for 50ms you need 6.25 megabyte of buffer.=0A> =0A> I also don'=
t see why performance and memory size would be relevant, I'd=0A> say forwar=
ding performance has more to do with CPU speed than anything=0A> else.=0A> =
=0A> --=0A> Mikael Abrahamsson    email: swmike@swm.pp.se=0A> _____________=
__________________________________=0A> Cerowrt-devel mailing list=0A> Cerow=
rt-devel@lists.bufferbloat.net=0A> https://lists.bufferbloat.net/listinfo/c=
erowrt-devel=0A>
------=_20140525160053000000_88227
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<font face=3D"arial" size=3D"2"><p style=3D"margin:0;padding:0;">Not that i=
t is directly relevant, but there is no essential reason to require 50 ms. =
of buffering. &nbsp;That might be true of some particular QOS-related route=
r algorithm. &nbsp;50 ms. is about all one can tolerate in any router betwe=
en source and destination for today's networks - an upper-bound rather than=
 a minimum.</p>=0A<p style=3D"margin:0;padding:0;">&nbsp;</p>=0A<p style=3D=
"margin:0;padding:0;"><span style=3D"font-family: Arial, Helvetica, Verdana=
, sans-serif;">The optimum buffer state for throughput is 1-2 packets worth=
 - in other words, if we have an MTU of 1500, 1500 - 3000 bytes. Only the b=
ottleneck buffer (the input queue to the lowest speed link along the path) =
should have this much actually buffered. Buffering more than this increases=
 end-to-end latency beyond its optimal state. &nbsp;Increased end-to-end la=
tency reduces the effectiveness of control loops, creating more congestion.=
</span></p>=0A<p style=3D"margin:0;padding:0;">&nbsp;</p>=0A<p style=3D"mar=
gin:0;padding:0;">The rationale for having 50 ms. of buffering is probably =
to avoid disruption of bursty mixed flows where the bursts might persist fo=
r 50 ms. and then die. One reason for this is that source nodes run operati=
ng systems that tend to release packets in bursts. That's a whole other dis=
cussion - in an ideal world, source nodes would avoid bursty packet release=
s by letting the control by the receiver window be "tight" timing-wise. &nb=
sp;That is, to transmit a packet immediately at the instant an ACK arrives =
increasing the window. &nbsp;This would pace the flow - current OS's tend (=
due to scheduling mismatches) to send bursts of packets, "catching up" on s=
ending that could have been spaced out and done earlier if the feedback fro=
m the receiver's window advancing were heeded.</p>=0A<p style=3D"margin:0;p=
adding:0;">&nbsp;</p>=0A<p style=3D"margin:0;padding:0;">That is, endpoint =
network stacks (TCP implementations) can worsen congestion by "dallying". &=
nbsp;The ideal end-to-end flows occupying a congested router would have the=
ir packets paced so that the packets end up being sent in the least bursty =
manner that an application can support. &nbsp;The effect of this pacing is =
to move the "backlog" for each flow quickly into the source node for that f=
low, which then provides back pressure on the application driving the flow,=
 which ultimately is necessary to stanch congestion. &nbsp;The ideal conges=
tion control mechanism slows the sender part of the application to a pace t=
hat can go through the network without contributing to buffering.</p>=0A<p =
style=3D"margin:0;padding:0;">&nbsp;</p>=0A<p style=3D"margin:0;padding:0;"=
>Current network stacks (including Linux's) don't achieve that goal - their=
 pushback on application sources is minimal - instead they accumulate buffe=
ring internal to the network implementation. &nbsp;This contributes to end-=
to-end latency as well. &nbsp;But if you think about it, this is almost as =
bad as switch-level bufferbloat in terms of degrading user experience. &nbs=
p;The reason I say "almost" is that there are tools, rarely used in practic=
e, that allow an application to specify that buffering should not build up =
in the network stack (in the kernel or wherever it is). &nbsp;But the defau=
lt is not to use those APIs, and to buffer way too much.</p>=0A<p style=3D"=
margin:0;padding:0;">&nbsp;</p>=0A<p style=3D"margin:0;padding:0;">Remember=
, the network send stack can act similarly to a congested switch (it is a s=
witch among all the user applications running on that node). &nbsp;IF there=
 is a heavy file transfer, the file transfer's buffering acts to increase l=
atency for all other networked communications on that machine.</p>=0A<p sty=
le=3D"margin:0;padding:0;">&nbsp;</p>=0A<p style=3D"margin:0;padding:0;">Tr=
aditionally this problem has been thought of only as a within-node fairness=
 issue, but in fact it has a big effect on the switches in between source a=
nd destination due to the lack of dispersed pacing of the packets at the so=
urce - in other words, the current design does nothing to stem the "burst g=
roups" from a single source mentioned above.</p>=0A<p style=3D"margin:0;pad=
ding:0;">&nbsp;</p>=0A<p style=3D"margin:0;padding:0;">So we do need the so=
urce nodes to implement less "bursty" sending stacks. &nbsp;This is especia=
lly true for multiplexed source nodes, such as web servers implementing tho=
usands of flows.</p>=0A<p style=3D"margin:0;padding:0;">&nbsp;</p>=0A<p sty=
le=3D"margin:0;padding:0;">A combination of codel-style switch-level buffer=
 management and the stack at the sender being implemented to spread packets=
 in a particular TCP flow out over time would improve things a lot. &nbsp;T=
o achieve best throughput, the optimal way to spread packets out on an end-=
to-end basis is to update the receive window (sending ACK) at the receive e=
nd as quickly as possible, and to respond to the updated receive window as =
quickly as possible when it increases.</p>=0A<p style=3D"margin:0;padding:0=
;">&nbsp;</p>=0A<p style=3D"margin:0;padding:0;">Just like the "bufferbloat=
" issue, the problem is caused by applications like streaming video, file t=
ransfers and big web pages that the application programmer sees as not havi=
ng a latency requirement within the flow, so the application programmer doe=
s not have an incentive to control pacing. &nbsp;Thus the operating system =
has got to push back on the applications' flow somehow, so that the flow en=
ds up paced once it enters the Internet itself. &nbsp;So there's no real pr=
oblem caused by large buffering in the network stack at the endpoint, as lo=
ng as the stack's delivery to the Internet is paced by some mechanism, e.g.=
 tight management of receive window control on an end-to-end basis.</p>=0A<=
p style=3D"margin:0;padding:0;">&nbsp;</p>=0A<p style=3D"margin:0;padding:0=
;">I don't think this can be fixed by cerowrt, so this is out of place here=
. &nbsp;It's partially ameliorated by cerowrt, if it aggressively drops pac=
kets from flows that burst without pacing. fq_codel does this, if the buffe=
r size it aims for is small - but the problem is that the OS stacks don't r=
espond by pacing... they tend to respond by bursting, not because TCP doesn=
't provide the mechanisms for pacing, but because the OS stack doesn't tran=
smit as soon as it is allowed to - thus building up a burst unnecessarily.<=
/p>=0A<p style=3D"margin:0;padding:0;">&nbsp;</p>=0A<p style=3D"margin:0;pa=
dding:0;">Bursts on a flow are thus bad in general. &nbsp;They make congest=
ion happen when it need not.</p>=0A<p style=3D"margin:0;padding:0;">&nbsp;<=
/p>=0A<p style=3D"margin:0;padding:0;">&nbsp;</p>=0A<p style=3D"margin:0;pa=
dding:0;"><br class=3D"WM_COMPOSE_SIGNATURE_START" /><br class=3D"WM_COMPOS=
E_SIGNATURE_END" /><br /><br />On Sunday, May 25, 2014 11:42am, "Mikael Abr=
ahamsson" &lt;swmike@swm.pp.se&gt; said:<br /><br /></p>=0A<div id=3D"SafeS=
tyles1401037700">=0A<p style=3D"margin:0;padding:0;">&gt; On Sun, 25 May 20=
14, Dane Medic wrote:<br />&gt; <br />&gt; &gt; Is it true that devices wit=
h less than 64 MB can't handle QOS? -&gt;<br />&gt; &gt; https://lists.cham=
bana.net/pipermail/commotion-dev/2014-May/001816.html<br />&gt; <br />&gt; =
At gig speeds you need around 50ms worth of buffering. 1 gigabit/s =3D<br /=
>&gt; 125 megabyte/s meaning for 50ms you need 6.25 megabyte of buffer.<br =
/>&gt; <br />&gt; I also don't see why performance and memory size would be=
 relevant, I'd<br />&gt; say forwarding performance has more to do with CPU=
 speed than anything<br />&gt; else.<br />&gt; <br />&gt; --<br />&gt; Mika=
el Abrahamsson    email: swmike@swm.pp.se<br />&gt; _______________________=
________________________<br />&gt; Cerowrt-devel mailing list<br />&gt; Cer=
owrt-devel@lists.bufferbloat.net<br />&gt; https://lists.bufferbloat.net/li=
stinfo/cerowrt-devel<br />&gt;</p>=0A</div></font>
------=_20140525160053000000_88227--