From mboxrd@z Thu Jan 1 00:00:00 1970
Return-Path:
Received: from smtp73.iad3a.emailsrvr.com (smtp73.iad3a.emailsrvr.com
[173.203.187.73])
(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
(Client did not present a certificate)
by huchra.bufferbloat.net (Postfix) with ESMTPS id 5622E21F266
for ;
Sun, 25 May 2014 13:00:54 -0700 (PDT)
Received: from localhost (localhost.localdomain [127.0.0.1])
by smtp10.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id
CCA7DE807F; Sun, 25 May 2014 16:00:52 -0400 (EDT)
X-Virus-Scanned: OK
Received: from app47.wa-webapps.iad3a (relay.iad3a.rsapps.net [172.27.255.110])
by smtp10.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id
9CDC1E8076; Sun, 25 May 2014 16:00:52 -0400 (EDT)
Received: from reed.com (localhost.localdomain [127.0.0.1])
by app47.wa-webapps.iad3a (Postfix) with ESMTP id A2A113800BF;
Sun, 25 May 2014 16:00:53 -0400 (EDT)
Received: by apps.rackspace.com
(Authenticated sender: dpreed@reed.com, from: dpreed@reed.com)
with HTTP; Sun, 25 May 2014 16:00:53 -0400 (EDT)
Date: Sun, 25 May 2014 16:00:53 -0400 (EDT)
From: dpreed@reed.com
To: "Mikael Abrahamsson"
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary="----=_20140525160053000000_88227"
Importance: Normal
X-Priority: 3 (Normal)
X-Type: html
In-Reply-To:
References:
Message-ID: <1401048053.664331760@apps.rackspace.com>
X-Mailer: webmail7.0
Cc: cerowrt-devel@lists.bufferbloat.net
Subject: Re: [Cerowrt-devel] Ubiquiti QOS
X-BeenThere: cerowrt-devel@lists.bufferbloat.net
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: Development issues regarding the cerowrt test router project
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
X-List-Received-Date: Sun, 25 May 2014 20:00:54 -0000
------=_20140525160053000000_88227
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
=0ANot that it is directly relevant, but there is no essential reason to re=
quire 50 ms. of buffering. That might be true of some particular QOS-relat=
ed router algorithm. 50 ms. is about all one can tolerate in any router be=
tween source and destination for today's networks - an upper-bound rather t=
han a minimum.=0A =0AThe optimum buffer state for throughput is 1-2 packets=
worth - in other words, if we have an MTU of 1500, 1500 - 3000 bytes. Only=
the bottleneck buffer (the input queue to the lowest speed link along the =
path) should have this much actually buffered. Buffering more than this inc=
reases end-to-end latency beyond its optimal state. Increased end-to-end l=
atency reduces the effectiveness of control loops, creating more congestion=
.=0A =0AThe rationale for having 50 ms. of buffering is probably to avoid d=
isruption of bursty mixed flows where the bursts might persist for 50 ms. a=
nd then die. One reason for this is that source nodes run operating systems=
that tend to release packets in bursts. That's a whole other discussion - =
in an ideal world, source nodes would avoid bursty packet releases by letti=
ng the control by the receiver window be "tight" timing-wise. That is, to =
transmit a packet immediately at the instant an ACK arrives increasing the =
window. This would pace the flow - current OS's tend (due to scheduling mi=
smatches) to send bursts of packets, "catching up" on sending that could ha=
ve been spaced out and done earlier if the feedback from the receiver's win=
dow advancing were heeded.=0A =0AThat is, endpoint network stacks (TCP impl=
ementations) can worsen congestion by "dallying". The ideal end-to-end flo=
ws occupying a congested router would have their packets paced so that the =
packets end up being sent in the least bursty manner that an application ca=
n support. The effect of this pacing is to move the "backlog" for each flo=
w quickly into the source node for that flow, which then provides back pres=
sure on the application driving the flow, which ultimately is necessary to =
stanch congestion. The ideal congestion control mechanism slows the sender=
part of the application to a pace that can go through the network without =
contributing to buffering.=0A =0ACurrent network stacks (including Linux's)=
don't achieve that goal - their pushback on application sources is minimal=
- instead they accumulate buffering internal to the network implementation=
. This contributes to end-to-end latency as well. But if you think about =
it, this is almost as bad as switch-level bufferbloat in terms of degrading=
user experience. The reason I say "almost" is that there are tools, rarel=
y used in practice, that allow an application to specify that buffering sho=
uld not build up in the network stack (in the kernel or wherever it is). B=
ut the default is not to use those APIs, and to buffer way too much.=0A =0A=
Remember, the network send stack can act similarly to a congested switch (i=
t is a switch among all the user applications running on that node). IF th=
ere is a heavy file transfer, the file transfer's buffering acts to increas=
e latency for all other networked communications on that machine.=0A =0ATra=
ditionally this problem has been thought of only as a within-node fairness =
issue, but in fact it has a big effect on the switches in between source an=
d destination due to the lack of dispersed pacing of the packets at the sou=
rce - in other words, the current design does nothing to stem the "burst gr=
oups" from a single source mentioned above.=0A =0ASo we do need the source =
nodes to implement less "bursty" sending stacks. This is especially true f=
or multiplexed source nodes, such as web servers implementing thousands of =
flows.=0A =0AA combination of codel-style switch-level buffer management an=
d the stack at the sender being implemented to spread packets in a particul=
ar TCP flow out over time would improve things a lot. To achieve best thro=
ughput, the optimal way to spread packets out on an end-to-end basis is to =
update the receive window (sending ACK) at the receive end as quickly as po=
ssible, and to respond to the updated receive window as quickly as possible=
when it increases.=0A =0AJust like the "bufferbloat" issue, the problem is=
caused by applications like streaming video, file transfers and big web pa=
ges that the application programmer sees as not having a latency requiremen=
t within the flow, so the application programmer does not have an incentive=
to control pacing. Thus the operating system has got to push back on the =
applications' flow somehow, so that the flow ends up paced once it enters t=
he Internet itself. So there's no real problem caused by large buffering i=
n the network stack at the endpoint, as long as the stack's delivery to the=
Internet is paced by some mechanism, e.g. tight management of receive wind=
ow control on an end-to-end basis.=0A =0AI don't think this can be fixed by=
cerowrt, so this is out of place here. It's partially ameliorated by cero=
wrt, if it aggressively drops packets from flows that burst without pacing.=
fq_codel does this, if the buffer size it aims for is small - but the prob=
lem is that the OS stacks don't respond by pacing... they tend to respond b=
y bursting, not because TCP doesn't provide the mechanisms for pacing, but =
because the OS stack doesn't transmit as soon as it is allowed to - thus bu=
ilding up a burst unnecessarily.=0A =0ABursts on a flow are thus bad in gen=
eral. They make congestion happen when it need not.=0A =0A =0A=0A=0AOn Sun=
day, May 25, 2014 11:42am, "Mikael Abrahamsson" said:=0A=
=0A=0A=0A> On Sun, 25 May 2014, Dane Medic wrote:=0A> =0A> > Is it true tha=
t devices with less than 64 MB can't handle QOS? ->=0A> > https://lists.cha=
mbana.net/pipermail/commotion-dev/2014-May/001816.html=0A> =0A> At gig spee=
ds you need around 50ms worth of buffering. 1 gigabit/s =3D=0A> 125 megabyt=
e/s meaning for 50ms you need 6.25 megabyte of buffer.=0A> =0A> I also don'=
t see why performance and memory size would be relevant, I'd=0A> say forwar=
ding performance has more to do with CPU speed than anything=0A> else.=0A> =
=0A> --=0A> Mikael Abrahamsson email: swmike@swm.pp.se=0A> _____________=
__________________________________=0A> Cerowrt-devel mailing list=0A> Cerow=
rt-devel@lists.bufferbloat.net=0A> https://lists.bufferbloat.net/listinfo/c=
erowrt-devel=0A>
------=_20140525160053000000_88227
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Not that i=
t is directly relevant, but there is no essential reason to require 50 ms. =
of buffering. That might be true of some particular QOS-related route=
r algorithm. 50 ms. is about all one can tolerate in any router betwe=
en source and destination for today's networks - an upper-bound rather than=
a minimum.
=0A
=0AThe optimum buffer state for throughput is 1-2 packets worth=
- in other words, if we have an MTU of 1500, 1500 - 3000 bytes. Only the b=
ottleneck buffer (the input queue to the lowest speed link along the path) =
should have this much actually buffered. Buffering more than this increases=
end-to-end latency beyond its optimal state. Increased end-to-end la=
tency reduces the effectiveness of control loops, creating more congestion.=
=0A
=0AThe rationale for having 50 ms. of buffering is probably =
to avoid disruption of bursty mixed flows where the bursts might persist fo=
r 50 ms. and then die. One reason for this is that source nodes run operati=
ng systems that tend to release packets in bursts. That's a whole other dis=
cussion - in an ideal world, source nodes would avoid bursty packet release=
s by letting the control by the receiver window be "tight" timing-wise. &nb=
sp;That is, to transmit a packet immediately at the instant an ACK arrives =
increasing the window. This would pace the flow - current OS's tend (=
due to scheduling mismatches) to send bursts of packets, "catching up" on s=
ending that could have been spaced out and done earlier if the feedback fro=
m the receiver's window advancing were heeded.
=0A
=0AThat is, endpoint =
network stacks (TCP implementations) can worsen congestion by "dallying". &=
nbsp;The ideal end-to-end flows occupying a congested router would have the=
ir packets paced so that the packets end up being sent in the least bursty =
manner that an application can support. The effect of this pacing is =
to move the "backlog" for each flow quickly into the source node for that f=
low, which then provides back pressure on the application driving the flow,=
which ultimately is necessary to stanch congestion. The ideal conges=
tion control mechanism slows the sender part of the application to a pace t=
hat can go through the network without contributing to buffering.
=0A
=0ACurrent network stacks (including Linux's) don't achieve that goal - their=
pushback on application sources is minimal - instead they accumulate buffe=
ring internal to the network implementation. This contributes to end-=
to-end latency as well. But if you think about it, this is almost as =
bad as switch-level bufferbloat in terms of degrading user experience. &nbs=
p;The reason I say "almost" is that there are tools, rarely used in practic=
e, that allow an application to specify that buffering should not build up =
in the network stack (in the kernel or wherever it is). But the defau=
lt is not to use those APIs, and to buffer way too much.
=0A
=0ARemember=
, the network send stack can act similarly to a congested switch (it is a s=
witch among all the user applications running on that node). IF there=
is a heavy file transfer, the file transfer's buffering acts to increase l=
atency for all other networked communications on that machine.
=0A
=0ATr=
aditionally this problem has been thought of only as a within-node fairness=
issue, but in fact it has a big effect on the switches in between source a=
nd destination due to the lack of dispersed pacing of the packets at the so=
urce - in other words, the current design does nothing to stem the "burst g=
roups" from a single source mentioned above.
=0A
=0ASo we do need the so=
urce nodes to implement less "bursty" sending stacks. This is especia=
lly true for multiplexed source nodes, such as web servers implementing tho=
usands of flows.
=0A
=0AA combination of codel-style switch-level buffer=
management and the stack at the sender being implemented to spread packets=
in a particular TCP flow out over time would improve things a lot. T=
o achieve best throughput, the optimal way to spread packets out on an end-=
to-end basis is to update the receive window (sending ACK) at the receive e=
nd as quickly as possible, and to respond to the updated receive window as =
quickly as possible when it increases.
=0A
=0AJust like the "bufferbloat=
" issue, the problem is caused by applications like streaming video, file t=
ransfers and big web pages that the application programmer sees as not havi=
ng a latency requirement within the flow, so the application programmer doe=
s not have an incentive to control pacing. Thus the operating system =
has got to push back on the applications' flow somehow, so that the flow en=
ds up paced once it enters the Internet itself. So there's no real pr=
oblem caused by large buffering in the network stack at the endpoint, as lo=
ng as the stack's delivery to the Internet is paced by some mechanism, e.g.=
tight management of receive window control on an end-to-end basis.
=0A<=
p style=3D"margin:0;padding:0;">
=0AI don't think this can be fixed by cerowrt, so this is out of place here=
. It's partially ameliorated by cerowrt, if it aggressively drops pac=
kets from flows that burst without pacing. fq_codel does this, if the buffe=
r size it aims for is small - but the problem is that the OS stacks don't r=
espond by pacing... they tend to respond by bursting, not because TCP doesn=
't provide the mechanisms for pacing, but because the OS stack doesn't tran=
smit as soon as it is allowed to - thus building up a burst unnecessarily.<=
/p>=0A
=0ABursts on a flow are thus bad in general. They make congest=
ion happen when it need not.
=0A <=
/p>=0A
=0A
On Sunday, May 25, 2014 11:42am, "Mikael Abr=
ahamsson" <swmike@swm.pp.se> said:
=0A=0A
> On Sun, 25 May 20=
14, Dane Medic wrote:
>
> > Is it true that devices wit=
h less than 64 MB can't handle QOS? ->
> > https://lists.cham=
bana.net/pipermail/commotion-dev/2014-May/001816.html
>
> =
At gig speeds you need around 50ms worth of buffering. 1 gigabit/s =3D
> 125 megabyte/s meaning for 50ms you need 6.25 megabyte of buffer.
>
> I also don't see why performance and memory size would be=
relevant, I'd
> say forwarding performance has more to do with CPU=
speed than anything
> else.
>
> --
> Mika=
el Abrahamsson email: swmike@swm.pp.se
> _______________________=
________________________
> Cerowrt-devel mailing list
> Cer=
owrt-devel@lists.bufferbloat.net
> https://lists.bufferbloat.net/li=
stinfo/cerowrt-devel
>
=0A
------=_20140525160053000000_88227--