From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ob0-x235.google.com (mail-ob0-x235.google.com [IPv6:2607:f8b0:4003:c01::235]) (using TLSv1 with cipher RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by huchra.bufferbloat.net (Postfix) with ESMTPS id 8F22421F2A5 for ; Tue, 27 May 2014 08:23:34 -0700 (PDT) Received: by mail-ob0-f181.google.com with SMTP id wm4so9474685obc.40 for ; Tue, 27 May 2014 08:23:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=0RgbpGE1ONUqbZSenu/r2nCHOm9Z1znhR7TIKIxZ1TM=; b=Zj4/nZVa6fA5Ig2hX5YVxjyITTcY4EcuFLAnqS7t0Y4oNglITnCD5VTmsuVJ3J4Okp 9sMdBo4P/b0N+jBan09BQ0ztilugOqUClURlRYm/sgrpMdbDeGObU1+7o4vHYStDC3je 0asbp34AwtrRSZ5SlUqSzRTrruqRZoY7gDBo+pcj2czKjwMDkgFezQtzwAULn/rVoz2H z/5M+7crwdUsFGZRUl751ciD5cz0pN7pMHytxlPRhQO3YX/soJuYkL3SBJ2CBF/L2JKn 8xjH+TqLk0wNn8ce0iGyfFfLQlQ2BUA8NJ9kqdvcb0uC2AxsnaJubH5uRYM41l4ZV/Ia hgsA== MIME-Version: 1.0 X-Received: by 10.60.44.243 with SMTP id h19mr33726543oem.46.1401204213255; Tue, 27 May 2014 08:23:33 -0700 (PDT) Sender: gettysjim@gmail.com Received: by 10.76.73.100 with HTTP; Tue, 27 May 2014 08:23:33 -0700 (PDT) In-Reply-To: <1401048053.664331760@apps.rackspace.com> References: <1401048053.664331760@apps.rackspace.com> Date: Tue, 27 May 2014 11:23:33 -0400 X-Google-Sender-Auth: IsGKiaM-ZpDnla8JAKEdy6yqgCM Message-ID: From: Jim Gettys To: David P Reed Content-Type: multipart/alternative; boundary=001a11331cb06397d504fa6347a3 Cc: "cerowrt-devel@lists.bufferbloat.net" Subject: Re: [Cerowrt-devel] Ubiquiti QOS X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 May 2014 15:23:34 -0000 --001a11331cb06397d504fa6347a3 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Sun, May 25, 2014 at 4:00 PM, wrote: > Not that it is directly relevant, but there is no essential reason to > require 50 ms. of buffering. That might be true of some particular > QOS-related router algorithm. 50 ms. is about all one can tolerate in an= y > router between source and destination for today's networks - an upper-bou= nd > rather than a minimum. > > > > The optimum buffer state for throughput is 1-2 packets worth - in other > words, if we have an MTU of 1500, 1500 - 3000 bytes. Only the bottleneck > buffer (the input queue to the lowest speed link along the path) should > have this much actually buffered. Buffering more than this increases > end-to-end latency beyond its optimal state. Increased end-to-end latenc= y > reduces the effectiveness of control loops, creating more congestion. > > > > The rationale for having 50 ms. of buffering is probably to avoid > disruption of bursty mixed flows where the bursts might persist for 50 ms= . > and then die. One reason for this is that source nodes run operating > systems that tend to release packets in bursts. That's a whole other > discussion - in an ideal world, source nodes would avoid bursty packet > releases by letting the control by the receiver window be "tight" > timing-wise. That is, to transmit a packet immediately at the instant an > ACK arrives increasing the window. This would pace the flow - current OS= 's > tend (due to scheduling mismatches) to send bursts of packets, "catching > up" on sending that could have been spaced out and done earlier if the > feedback from the receiver's window advancing were heeded. > =E2=80=8B > > > That is, endpoint network stacks (TCP implementations) can worsen > congestion by "dallying". The ideal end-to-end flows occupying a congest= ed > router would have their packets paced so that the packets end up being se= nt > in the least bursty manner that an application can support. The effect o= f > this pacing is to move the "backlog" for each flow quickly into the sourc= e > node for that flow, which then provides back pressure on the application > driving the flow, which ultimately is necessary to stanch congestion. Th= e > ideal congestion control mechanism slows the sender part of the applicati= on > to a pace that can go through the network without contributing to bufferi= ng. > =E2=80=8B=E2=80=8B =E2=80=8BPacing is in Linux 3.12(?). How long it will take to see widespre= ad deployment is another question, and as for other operating systems, who knows. See: https://lwn.net/Articles/564978/ =E2=80=8B=E2=80=8B > > > Current network stacks (including Linux's) don't achieve that goal - thei= r > pushback on application sources is minimal - instead they accumulate > buffering internal to the network implementation. > =E2=80=8BThis is much, much less true than it once was. There have been substantial changes in the Linux TCP stack in the last year or two, to avoid generating packets before necessary. Again, how long it will take for people to deploy this on Linux (and implement on other OS's) is a question. =E2=80=8B > This contributes to end-to-end latency as well. But if you think about > it, this is almost as bad as switch-level bufferbloat in terms of degradi= ng > user experience. The reason I say "almost" is that there are tools, rare= ly > used in practice, that allow an application to specify that buffering > should not build up in the network stack (in the kernel or wherever it is= ). > But the default is not to use those APIs, and to buffer way too much. > > > > Remember, the network send stack can act similarly to a congested switch > (it is a switch among all the user applications running on that node). I= F > there is a heavy file transfer, the file transfer's buffering acts to > increase latency for all other networked communications on that machine. > > > > Traditionally this problem has been thought of only as a within-node > fairness issue, but in fact it has a big effect on the switches in betwee= n > source and destination due to the lack of dispersed pacing of the packets > at the source - in other words, the current design does nothing to stem t= he > "burst groups" from a single source mentioned above. > > > > So we do need the source nodes to implement less "bursty" sending stacks. > This is especially true for multiplexed source nodes, such as web server= s > implementing thousands of flows. > > > > A combination of codel-style switch-level buffer management and the stack > at the sender being implemented to spread packets in a particular TCP flo= w > out over time would improve things a lot. To achieve best throughput, th= e > optimal way to spread packets out on an end-to-end basis is to update the > receive window (sending ACK) at the receive end as quickly as possible, a= nd > to respond to the updated receive window as quickly as possible when it > increases. > > > > Just like the "bufferbloat" issue, the problem is caused by applications > like streaming video, file transfers and big web pages that the applicati= on > programmer sees as not having a latency requirement within the flow, so t= he > application programmer does not have an incentive to control pacing. Thu= s > the operating system has got to push back on the applications' flow > somehow, so that the flow ends up paced once it enters the Internet itsel= f. > So there's no real problem caused by large buffering in the network stac= k > at the endpoint, as long as the stack's delivery to the Internet is paced > by some mechanism, e.g. tight management of receive window control on an > end-to-end basis. > > > > I don't think this can be fixed by cerowrt, so this is out of place here. > It's partially ameliorated by cerowrt, if it aggressively drops packets > from flows that burst without pacing. fq_codel does this, if the buffer > size it aims for is small - but the problem is that the OS stacks don't > respond by pacing... they tend to respond by bursting, not because TCP > doesn't provide the mechanisms for pacing, but because the OS stack doesn= 't > transmit as soon as it is allowed to - thus building up a burst > unnecessarily. > > > > Bursts on a flow are thus bad in general. They make congestion happen > when it need not. > =E2=80=8BBy far the biggest headache is what the Web does to the network. = It has turned the web into a burst generator. A typical web page may have 10 (or even more images). See the "connections per page" plot in the link below. A browser downloads the base page, and then, over N connections, essentially simultaneously downloads those embedded objects. Many/most of them are small in size (4-10 packets). You never even get near slow start. So you get an IW amount of data/TCP connection, with no pacing, and no congestion avoidance. It is easy to observe 50-100 packets (or more) back to back at the bottleneck. This is (in practice) the amount you have to buffer today: that burst of packets from a web page. Without flow queuing, you are screwed. With it, it's annoying, but can be tolerated. I go over this is detail in: http://gettys.wordpress.com/2013/07/10/low-latency-requires-smart-queuing-t= raditional-aqm-is-not-enough/ =E2=80=8B So far, I don't believe anyone has tried pacing the IW burst of packets. I'd certainly like to see that, but pacing needs to be across TCP connections (host pairs) to be possibly effective to outwit the gaming the web has done to the network. - Jim > > > > > > > > On Sunday, May 25, 2014 11:42am, "Mikael Abrahamsson" > said: > > > On Sun, 25 May 2014, Dane Medic wrote: > > > > > Is it true that devices with less than 64 MB can't handle QOS? -> > > > > https://lists.chambana.net/pipermail/commotion-dev/2014-May/001816.html > > > > At gig speeds you need around 50ms worth of buffering. 1 gigabit/s =3D > > 125 megabyte/s meaning for 50ms you need 6.25 megabyte of buffer. > > > > I also don't see why performance and memory size would be relevant, I'd > > say forwarding performance has more to do with CPU speed than anything > > else. > > > > -- > > Mikael Abrahamsson email: swmike@swm.pp.se > > _______________________________________________ > > Cerowrt-devel mailing list > > Cerowrt-devel@lists.bufferbloat.net > > https://lists.bufferbloat.net/listinfo/cerowrt-devel > > > > _______________________________________________ > Cerowrt-devel mailing list > Cerowrt-devel@lists.bufferbloat.net > https://lists.bufferbloat.net/listinfo/cerowrt-devel > > --001a11331cb06397d504fa6347a3 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


On Sun= , May 25, 2014 at 4:00 PM, <dpreed@reed.com> wrote:

N= ot that it is directly relevant, but there is no essential reason to requir= e 50 ms. of buffering. =C2=A0That might be true of some particular QOS-rela= ted router algorithm. =C2=A050 ms. is about all one can tolerate in any rou= ter between source and destination for today's networks - an upper-boun= d rather than a minimum.

=C2=A0

The optimum buffer state for throughput is 1-2 pack= ets worth - in other words, if we have an MTU of 1500, 1500 - 3000 bytes. O= nly the bottleneck buffer (the input queue to the lowest speed link along t= he path) should have this much actually buffered. Buffering more than this = increases end-to-end latency beyond its optimal state. =C2=A0Increased end-= to-end latency reduces the effectiveness of control loops, creating more co= ngestion.

=C2=A0

The rationale for having 50 ms. of buff= ering is probably to avoid disruption of bursty mixed flows where the burst= s might persist for 50 ms. and then die. One reason for this is that source= nodes run operating systems that tend to release packets in bursts. That&#= 39;s a whole other discussion - in an ideal world, source nodes would avoid= bursty packet releases by letting the control by the receiver window be &q= uot;tight" timing-wise. =C2=A0That is, to transmit a packet immediatel= y at the instant an ACK arrives increasing the window. =C2=A0This would pac= e the flow - current OS's tend (due to scheduling mismatches) to send b= ursts of packets, "catching up" on sending that could have been s= paced out and done earlier if the feedback from the receiver's window a= dvancing were heeded.

=E2=80=8B

=C2=A0

That is, endpoint network stacks (TCP i= mplementations) can worsen congestion by "dallying". =C2=A0The id= eal end-to-end flows occupying a congested router would have their packets = paced so that the packets end up being sent in the least bursty manner that= an application can support. =C2=A0The effect of this pacing is to move the= "backlog" for each flow quickly into the source node for that fl= ow, which then provides back pressure on the application driving the flow, = which ultimately is necessary to stanch congestion. =C2=A0The ideal congest= ion control mechanism slows the sender part of the application to a pace th= at can go through the network without contributing to buffering.


=E2=80=8B=E2=80=8B
=E2=80= =8BPacing is in Linux 3.12(?). =C2=A0How long it will take to see widesprea= d deployment is another question, and as for other operating systems, who k= nows.

=E2= =80=8B=E2=80=8B

=C2=A0

Current network stacks (including Linux= 's) don't achieve that goal - their pushback on application sources= is minimal - instead they accumulate buffering internal to the network imp= lementation. =C2=A0


=E2=80=8BThis is much, much less true than it once was. =C2= =A0There have been substantial changes in the Linux TCP stack in the last y= ear or two, to avoid generating packets before necessary. =C2=A0Again, how = long it will take for people to deploy this on Linux (and implement on othe= r OS's) is a question.
=E2=80=8B

This contributes t= o end-to-end latency as well. =C2=A0But if you think about it, this is almo= st as bad as switch-level bufferbloat in terms of degrading user experience= . =C2=A0The reason I say "almost" is that there are tools, rarely= used in practice, that allow an application to specify that buffering shou= ld not build up in the network stack (in the kernel or wherever it is). =C2= =A0But the default is not to use those APIs, and to buffer way too much.

=C2=A0

Remember, the network send stack can ac= t similarly to a congested switch (it is a switch among all the user applic= ations running on that node). =C2=A0IF there is a heavy file transfer, the = file transfer's buffering acts to increase latency for all other networ= ked communications on that machine.

=C2=A0

Traditionally this problem has been tho= ught of only as a within-node fairness issue, but in fact it has a big effe= ct on the switches in between source and destination due to the lack of dis= persed pacing of the packets at the source - in other words, the current de= sign does nothing to stem the "burst groups" from a single source= mentioned above.

=C2=A0

So we do need the source nodes to imple= ment less "bursty" sending stacks. =C2=A0This is especially true = for multiplexed source nodes, such as web servers implementing thousands of= flows.

=C2=A0

A combination of codel-style switch-lev= el buffer management and the stack at the sender being implemented to sprea= d packets in a particular TCP flow out over time would improve things a lot= . =C2=A0To achieve best throughput, the optimal way to spread packets out o= n an end-to-end basis is to update the receive window (sending ACK) at the = receive end as quickly as possible, and to respond to the updated receive w= indow as quickly as possible when it increases.

=C2=A0

Just like the "bufferbloat" i= ssue, the problem is caused by applications like streaming video, file tran= sfers and big web pages that the application programmer sees as not having = a latency requirement within the flow, so the application programmer does n= ot have an incentive to control pacing. =C2=A0Thus the operating system has= got to push back on the applications' flow somehow, so that the flow e= nds up paced once it enters the Internet itself. =C2=A0So there's no re= al problem caused by large buffering in the network stack at the endpoint, = as long as the stack's delivery to the Internet is paced by some mechan= ism, e.g. tight management of receive window control on an end-to-end basis= .

=C2=A0

I don't think this can be fixed by = cerowrt, so this is out of place here. =C2=A0It's partially ameliorated= by cerowrt, if it aggressively drops packets from flows that burst without= pacing. fq_codel does this, if the buffer size it aims for is small - but = the problem is that the OS stacks don't respond by pacing... they tend = to respond by bursting, not because TCP doesn't provide the mechanisms = for pacing, but because the OS stack doesn't transmit as soon as it is = allowed to - thus building up a burst unnecessarily.

=C2=A0

Bursts on a flow are thus bad in genera= l. =C2=A0They make congestion happen when it need not.


= =E2=80=8BBy far the biggest headache is what the Web does to the network. = =C2=A0It has turned the web into a burst generator.

A typical web page may have 10= (or even more images). =C2=A0See the "connections per page" plot= in the link below.

A browser downloads the = base page, and then, over N connections, essentially simultaneously downloa= ds those embedded objects. =C2=A0Many/most of them are small in size (4-10 = packets). =C2=A0You never even get near slow start.

So you get an IW amount of dat= a/TCP connection, with no pacing, and no congestion avoidance. =C2=A0It is = easy to observe 50-100 packets (or more) back to back at the bottleneck.

This is (in practice) the amou= nt you have to buffer today: that burst of packets from a web page. =C2=A0W= ithout flow queuing, you are screwed. =C2=A0With it, it's annoying, but= can be tolerated.


I go over this is detail in:

http://gettys.wordpress.com/2013/07/10/low-= latency-requires-smart-queuing-traditional-aqm-is-not-enough/=E2=80=8B<= /div>

So far, I don't believe an= yone has tried pacing the IW burst of packets. =C2=A0I'd certainly like= to see that, but pacing needs to be across TCP connections (host pairs) to= be possibly effective to outwit the gaming the web has done to the network= .
=C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0- Jim

=C2=A0

=C2=A0





On Sunday, May 25, 2014= 11:42am, "Mikael Abrahamsson" <swmike@swm.pp.se> said:

> On Sun, 25 May 2014, Dane Medic wr= ote:
>
> > Is it true that devices with less than 64 MB can= 't handle QOS? ->
> > https://li= sts.chambana.net/pipermail/commotion-dev/2014-May/001816.html
>
> At gig speeds you need around 50ms worth of buffering. 1 giga= bit/s =3D
> 125 megabyte/s meaning for 50ms you need 6.25 megabyte of= buffer.
>
> I also don't see why performance and memory s= ize would be relevant, I'd
> say forwarding performance has more to do with CPU speed than anything=
> else.
>
> --
> Mikael Abrahamsson email: swmike@swm.pp.se
> _______________________________________________
> Cerowrt-devel = mailing list
> Cerowrt-devel@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>


______________________________________________= _
Cerowrt-devel mailing list
Cerowrt-devel@lists.= bufferbloat.net
https://lists.bufferbloat.net/listinfo/cerowrt-devel


--001a11331cb06397d504fa6347a3--