From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp113.iad3a.emailsrvr.com (smtp113.iad3a.emailsrvr.com [173.203.187.113]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by huchra.bufferbloat.net (Postfix) with ESMTPS id 9A66E21F26C for ; Wed, 28 May 2014 08:33:43 -0700 (PDT) Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp23.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id 75AF0280115; Wed, 28 May 2014 11:33:42 -0400 (EDT) X-Virus-Scanned: OK Received: from app45.wa-webapps.iad3a (relay.iad3a.rsapps.net [172.27.255.110]) by smtp23.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id 574042800E6; Wed, 28 May 2014 11:33:42 -0400 (EDT) Received: from reed.com (localhost.localdomain [127.0.0.1]) by app45.wa-webapps.iad3a (Postfix) with ESMTP id 46D4038108C; Wed, 28 May 2014 11:33:42 -0400 (EDT) Received: by apps.rackspace.com (Authenticated sender: dpreed@reed.com, from: dpreed@reed.com) with HTTP; Wed, 28 May 2014 11:33:42 -0400 (EDT) Date: Wed, 28 May 2014 11:33:42 -0400 (EDT) From: dpreed@reed.com To: "Dave Taht" MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_20140528113342000000_53151" Importance: Normal X-Priority: 3 (Normal) X-Type: html In-Reply-To: References: <1401048053.664331760@apps.rackspace.com> Message-ID: <1401291222.288942@apps.rackspace.com> X-Mailer: webmail7.0 Cc: "cerowrt-devel@lists.bufferbloat.net" , bloat Subject: Re: [Cerowrt-devel] Ubiquiti QOS X-BeenThere: cerowrt-devel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: Development issues regarding the cerowrt test router project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 28 May 2014 15:33:44 -0000 ------=_20140528113342000000_53151 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable =0ASame concern I mentioned with Jim's message. I was not clear what I me= ant by "pacing" in the context of optimization of latency while preserving = throughput. It is NOT just a matter of spreading packets out in time that = I was talking about. It is a matter of doing so without reducing throughp= ut. That means transmitting as *early* as possible while avoiding congesti= on. Building a "backlog" and then artificially spreading it out by "add-on= pacing" will definitely reduce throughput below the flow's fair share of t= he bottleneck resource.=0A =0AIt is pretty clear to me that you can't get t= o a minimal latency, optimal throughput control algorithm by a series of "a= dd ons" in LART. It requires rethinking of the control discipline, and cha= nges to get more information about congestion earlier, without ever allowin= g a buffer queue to build up in intermediate nodes - since that destroys la= tency by definition.=0A =0AAs long as you require buffers to grow at bottle= neck links in order to get measurements of congestion, you probably are stu= ck with long-time-constant control loops, and as long as you encourage buff= ering at OS send stacks you are even worse off at the application layer.=0A= =0AThe problem is in the assumption that buffer queueing is the only possi= ble answer. The "pacing" being included in Linux is just another way to bu= ild bigger buffers (on the sending host), by taking control away from the T= CP control loop.=0A =0A =0A=0A=0AOn Tuesday, May 27, 2014 1:31pm, "Dave Tah= t" said:=0A=0A=0A=0A> This has been a good thread, an= d I'm sorry it was mostly on=0A> cerowrt-devel rather than the main list...= =0A> =0A> It is not clear from observing google's deployment that pacing of= the=0A> IW is not in use. I see=0A> clear 1ms boundaries for individual fl= ows on much lower than iw10=0A> boundaries. (e.g. I see 1-4=0A> packets at = a time arrive at 1ms intervals - but this could be an=0A> artifact of the c= apture, intermediate=0A> devices, etc)=0A> =0A> sch_fq comes with explicit = support for spreading out the initial=0A> window, (by default it allows a f= ull iw10 burst however) and tcp small=0A> queues and pacing-aware tcps and = the tso fixes and stuff we don't know=0A> about all are collaborating to re= duce the web burst size...=0A> =0A> sch_fq_codel used as the host/router qd= isc basically does spread out=0A> any flow if there is a bottleneck on the = link. The pacing stuff=0A> spreads flow delivery out across an estimate of = srtt by clock tick...=0A> =0A> It makes tremendous sense to pace out a flow= if you are hitting the=0A> wire at 10gbit and know you are stepping down t= o 100mbit or less on=0A> the end device - that 100x difference in rate is m= eaningful... and at=0A> the same time to get full throughput out of 10gbit = some level of tso=0A> offloads is needed... and the initial guess=0A> at th= e right pace is hard to get right before a couple RTTs go by.=0A> =0A> I lo= ok forward to learning what's up.=0A> =0A> On Tue, May 27, 2014 at 8:23 AM,= Jim Gettys wrote:=0A> >=0A> >=0A> >=0A> > On Sun, May= 25, 2014 at 4:00 PM, wrote:=0A> >>=0A> >> Not that it is= directly relevant, but there is no essential reason to=0A> >> require 50 m= s. of buffering. That might be true of some particular=0A> >> QOS-related = router algorithm. 50 ms. is about all one can tolerate in=0A> any=0A> >> r= outer between source and destination for today's networks - an=0A> upper-bo= und=0A> >> rather than a minimum.=0A> >>=0A> >>=0A> >>=0A> >> The optimum b= uffer state for throughput is 1-2 packets worth - in other=0A> >> words, if= we have an MTU of 1500, 1500 - 3000 bytes. Only the bottleneck=0A> >> buff= er (the input queue to the lowest speed link along the path) should=0A> hav= e=0A> >> this much actually buffered. Buffering more than this increases=0A= > end-to-end=0A> >> latency beyond its optimal state. Increased end-to-end= latency reduces=0A> the=0A> >> effectiveness of control loops, creating mo= re congestion.=0A> =0A> This misses an important facet of modern macs (wifi= , wireless, cable, and gpon),=0A> which which can aggregate 32k or more in = packets.=0A> =0A> So the ideal size in those cases is much larger than a MT= U, and has additional=0A> factors governing the ideal - such as the probabi= lity of a packet loss inducing=0A> a retransmit....=0A> =0A> Ethernet, sure= .=0A> =0A> >>=0A> >>=0A> >>=0A> >> The rationale for having 50 ms. of buffe= ring is probably to avoid=0A> >> disruption of bursty mixed flows where the= bursts might persist for 50=0A> ms.=0A> >> and then die. One reason for th= is is that source nodes run operating=0A> systems=0A> >> that tend to relea= se packets in bursts. That's a whole other discussion -=0A> in=0A> >> an id= eal world, source nodes would avoid bursty packet releases by=0A> letting= =0A> >> the control by the receiver window be "tight" timing-wise. That is= , to=0A> >> transmit a packet immediately at the instant an ACK arrives inc= reasing=0A> the=0A> >> window. This would pace the flow - current OS's ten= d (due to scheduling=0A> >> mismatches) to send bursts of packets, "catchin= g up" on sending that=0A> could=0A> >> have been spaced out and done earlie= r if the feedback from the=0A> receiver's=0A> >> window advancing were heed= ed.=0A> =0A> This loop has got ever tighter since linux 3.3, to where it's = really as tight=0A> as a modern cpu scheduler can get it. (or so I keep thi= nking -=0A> but successive improvements in linux tcp keep proving me wrong.= :)=0A> =0A> I am really in awe of linux tcp these days. Recently I was ben= chmarking=0A> windows and macos. Windows only got 60% of the throughput lin= ux tcp=0A> did at gigE speeds, and osx had a lot of issues at 10mbit and be= low,=0A> stretch acks and holding the window too high for the path)=0A> =0A= > I keep hoping better ethernet hardware will arrive that can mix flows=0A>= even more.=0A> =0A> >>=0A> >>=0A> >>=0A> >> That is, endpoint network stac= ks (TCP implementations) can worsen=0A> >> congestion by "dallying". The i= deal end-to-end flows occupying a=0A> congested=0A> >> router would have th= eir packets paced so that the packets end up being=0A> sent=0A> >> in the l= east bursty manner that an application can support. The effect=0A> of=0A> = >> this pacing is to move the "backlog" for each flow quickly into the=0A> = source=0A> >> node for that flow, which then provides back pressure on the = application=0A> >> driving the flow, which ultimately is necessary to stanc= h congestion. =0A> The=0A> >> ideal congestion control mechanism slows the = sender part of the=0A> application=0A> >> to a pace that can go through the= network without contributing to=0A> buffering.=0A> >=0A> >=0A> > Pacing is= in Linux 3.12(?). How long it will take to see widespread=0A> > deploymen= t is another question, and as for other operating systems, who=0A> > knows.= =0A> >=0A> > See: https://lwn.net/Articles/564978/=0A> =0A> Steinar drove s= ome of this with persistence and results...=0A> =0A> http://www.linux-suppo= rt.com/cms/steinar-h-gunderson-paced-tcp-and-the-fq-scheduler/=0A> =0A> >>= =0A> >>=0A> >>=0A> >> Current network stacks (including Linux's) don't achi= eve that goal -=0A> their=0A> >> pushback on application sources is minimal= - instead they accumulate=0A> >> buffering internal to the network impleme= ntation.=0A> >=0A> >=0A> > This is much, much less true than it once was. = There have been substantial=0A> > changes in the Linux TCP stack in the las= t year or two, to avoid generating=0A> > packets before necessary. Again, = how long it will take for people to deploy=0A> > this on Linux (and impleme= nt on other OS's) is a question.=0A> =0A> The data centers I'm in (linode, = isc, google cloud) seem to be=0A> tracking modern kernels pretty good...=0A= > =0A> >>=0A> >> This contributes to end-to-end latency as well. But if yo= u think about=0A> >> it, this is almost as bad as switch-level bufferbloat = in terms of=0A> degrading=0A> >> user experience. The reason I say "almost= " is that there are tools,=0A> rarely=0A> >> used in practice, that allow a= n application to specify that buffering=0A> should=0A> >> not build up in t= he network stack (in the kernel or wherever it is). =0A> But=0A> >> the def= ault is not to use those APIs, and to buffer way too much.=0A> >>=0A> >>=0A= > >>=0A> >> Remember, the network send stack can act similarly to a congest= ed switch=0A> >> (it is a switch among all the user applications running on= that node). =0A> IF=0A> >> there is a heavy file transfer, the file transf= er's buffering acts to=0A> >> increase latency for all other networked comm= unications on that machine.=0A> >>=0A> >>=0A> >>=0A> >> Traditionally this = problem has been thought of only as a within-node=0A> >> fairness issue, bu= t in fact it has a big effect on the switches in=0A> between=0A> >> source = and destination due to the lack of dispersed pacing of the packets=0A> at= =0A> >> the source - in other words, the current design does nothing to ste= m the=0A> >> "burst groups" from a single source mentioned above.=0A> >>=0A= > >>=0A> >>=0A> >> So we do need the source nodes to implement less "bursty= " sending=0A> stacks.=0A> >> This is especially true for multiplexed source= nodes, such as web=0A> servers=0A> >> implementing thousands of flows.=0A>= >>=0A> >>=0A> >>=0A> >> A combination of codel-style switch-level buffer m= anagement and the=0A> stack=0A> >> at the sender being implemented to sprea= d packets in a particular TCP=0A> flow=0A> >> out over time would improve t= hings a lot. To achieve best throughput,=0A> the=0A> >> optimal way to spr= ead packets out on an end-to-end basis is to update=0A> the=0A> >> receive = window (sending ACK) at the receive end as quickly as possible,=0A> and=0A>= >> to respond to the updated receive window as quickly as possible when it= =0A> >> increases.=0A> >>=0A> >>=0A> >>=0A> >> Just like the "bufferbloat" = issue, the problem is caused by applications=0A> >> like streaming video, f= ile transfers and big web pages that the=0A> application=0A> >> programmer = sees as not having a latency requirement within the flow, so=0A> the=0A> >>= application programmer does not have an incentive to control pacing. =0A> = Thus=0A> >> the operating system has got to push back on the applications' = flow=0A> somehow,=0A> >> so that the flow ends up paced once it enters the = Internet itself. So=0A> >> there's no real problem caused by large bufferi= ng in the network stack=0A> at=0A> >> the endpoint, as long as the stack's = delivery to the Internet is paced=0A> by=0A> >> some mechanism, e.g. tight = management of receive window control on an=0A> >> end-to-end basis.=0A> >>= =0A> >>=0A> >>=0A> >> I don't think this can be fixed by cerowrt, so this i= s out of place=0A> here.=0A> >> It's partially ameliorated by cerowrt, if i= t aggressively drops packets=0A> from=0A> >> flows that burst without pacin= g. fq_codel does this, if the buffer size=0A> it=0A> >> aims for is small -= but the problem is that the OS stacks don't respond=0A> by=0A> >> pacing..= . they tend to respond by bursting, not because TCP doesn't=0A> provide=0A>= >> the mechanisms for pacing, but because the OS stack doesn't transmit as= =0A> soon=0A> >> as it is allowed to - thus building up a burst unnecessari= ly.=0A> >>=0A> >>=0A> >>=0A> >> Bursts on a flow are thus bad in general. = They make congestion happen=0A> >> when it need not.=0A> >=0A> >=0A> > By f= ar the biggest headache is what the Web does to the network. It has=0A> > = turned the web into a burst generator.=0A> >=0A> > A typical web page may h= ave 10 (or even more images). See the "connections=0A> > per page" plot in= the link below.=0A> >=0A> > A browser downloads the base page, and then, o= ver N connections, essentially=0A> > simultaneously downloads those embedde= d objects. Many/most of them are=0A> > small in size (4-10 packets). You = never even get near slow start.=0A> >=0A> > So you get an IW amount of data= /TCP connection, with no pacing, and no=0A> > congestion avoidance. It is = easy to observe 50-100 packets (or more) back=0A> > to back at the bottlene= ck.=0A> >=0A> > This is (in practice) the amount you have to buffer today: = that burst of=0A> > packets from a web page. Without flow queuing, you are= screwed. With it,=0A> > it's annoying, but can be tolerated.=0A> >=0A> >= =0A> > I go over this is detail in:=0A> >=0A> >=0A> http://gettys.wordpress= .com/2013/07/10/low-latency-requires-smart-queuing-traditional-aqm-is-not-e= nough/=0A> >=0A> > So far, I don't believe anyone has tried pacing the IW b= urst of packets.=0A> > I'd certainly like to see that, but pacing needs to = be across TCP=0A> > connections (host pairs) to be possibly effective to ou= twit the gaming the=0A> > web has done to the network.=0A> > = - Jim=0A> >=0A> >>=0A> >= >=0A> >>=0A> >>=0A> >>=0A> >>=0A> >>=0A> >>=0A> >> On Sunday, May 25, 2014 = 11:42am, "Mikael Abrahamsson"=0A> =0A> >> said:=0A> >>=0A= > >> > On Sun, 25 May 2014, Dane Medic wrote:=0A> >> >=0A> >> > > Is it tru= e that devices with less than 64 MB can't handle QOS?=0A> ->=0A> >> > >=0A>= >> > >=0A> https://lists.chambana.net/pipermail/commotion-dev/2014-May/001= 816.html=0A> >> >=0A> >> > At gig speeds you need around 50ms worth of buff= ering. 1 gigabit/s=0A> =3D=0A> >> > 125 megabyte/s meaning for 50ms you nee= d 6.25 megabyte of buffer.=0A> >> >=0A> >> > I also don't see why performan= ce and memory size would be relevant,=0A> I'd=0A> >> > say forwarding perfo= rmance has more to do with CPU speed than=0A> anything=0A> >> > else.=0A> >= > >=0A> >> > --=0A> >> > Mikael Abrahamsson email: swmike@swm.pp.se=0A> >> = > _______________________________________________=0A> >> > Cerowrt-devel ma= iling list=0A> >> > Cerowrt-devel@lists.bufferbloat.net=0A> >> > https://li= sts.bufferbloat.net/listinfo/cerowrt-devel=0A> >> >=0A> >>=0A> >>=0A> >> __= _____________________________________________=0A> >> Cerowrt-devel mailing = list=0A> >> Cerowrt-devel@lists.bufferbloat.net=0A> >> https://lists.buffer= bloat.net/listinfo/cerowrt-devel=0A> >>=0A> >=0A> >=0A> > _________________= ______________________________=0A> > Cerowrt-devel mailing list=0A> > Cerow= rt-devel@lists.bufferbloat.net=0A> > https://lists.bufferbloat.net/listinfo= /cerowrt-devel=0A> >=0A> =0A> =0A> =0A> --=0A> Dave T=C3=A4ht=0A> =0A> NSFW= :=0A> https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0296_= indecent.article=0A> ------=_20140528113342000000_53151 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

Same conce= rn I mentioned with Jim's message.   I was not clear what I meant by "= pacing" in the context of optimization of latency while preserving throughp= ut.  It is NOT just a matter of spreading packets out in time that I w= as talking about.   It is a matter of doing so without reducing throug= hput.  That means transmitting as *early* as possible while avoiding c= ongestion.  Building a "backlog" and then artificially spreading it ou= t by "add-on pacing" will definitely reduce throughput below the flow's fai= r share of the bottleneck resource.

=0A

=  

=0A

It is pretty clear to me that= you can't get to a minimal latency, optimal throughput control algorithm b= y a series of "add ons" in LART.  It requires rethinking of the contro= l discipline, and changes to get more information about congestion earlier,= without ever allowing a buffer queue to build up in intermediate nodes - s= ince that destroys latency by definition.

=0A

 

=0A

As long as you require = buffers to grow at bottleneck links in order to get measurements of congest= ion, you probably are stuck with long-time-constant control loops, and as l= ong as you encourage buffering at OS send stacks you are even worse off at = the application layer.

=0A

 

=0A=

The problem is in the assumption that buff= er queueing is the only possible answer.  The "pacing" being included = in Linux is just another way to build bigger buffers (on the sending host),= by taking control away from the TCP control loop.

=0A

 

=0A

 

=0A<= p style=3D"margin:0;padding:0;">
=


On Tuesday, May 27, 20= 14 1:31pm, "Dave Taht" <dave.taht@gmail.com> said:

=0A=
=0A

> T= his has been a good thread, and I'm sorry it was mostly on
> cerowr= t-devel rather than the main list...
>
> It is not clear f= rom observing google's deployment that pacing of the
> IW is not in= use. I see
> clear 1ms boundaries for individual flows on much low= er than iw10
> boundaries. (e.g. I see 1-4
> packets at a t= ime arrive at 1ms intervals - but this could be an
> artifact of th= e capture, intermediate
> devices, etc)
>
> sch_fq= comes with explicit support for spreading out the initial
> window= , (by default it allows a full iw10 burst however) and tcp small
> = queues and pacing-aware tcps and the tso fixes and stuff we don't know
> about all are collaborating to reduce the web burst size...
>=
> sch_fq_codel used as the host/router qdisc basically does sprea= d out
> any flow if there is a bottleneck on the link. The pacing s= tuff
> spreads flow delivery out across an estimate of srtt by cloc= k tick...
>
> It makes tremendous sense to pace out a flow= if you are hitting the
> wire at 10gbit and know you are stepping = down to 100mbit or less on
> the end device - that 100x difference = in rate is meaningful... and at
> the same time to get full through= put out of 10gbit some level of tso
> offloads is needed... and the= initial guess
> at the right pace is hard to get right before a co= uple RTTs go by.
>
> I look forward to learning what's up.=
>
> On Tue, May 27, 2014 at 8:23 AM, Jim Gettys <jg@fr= eedesktop.org> wrote:
> >
> >
> >
= > > On Sun, May 25, 2014 at 4:00 PM, <dpreed@reed.com> wrote:> >>
> >> Not that it is directly relevant, but = there is no essential reason to
> >> require 50 ms. of buffer= ing. That might be true of some particular
> >> QOS-related = router algorithm. 50 ms. is about all one can tolerate in
> any> >> router between source and destination for today's networks= - an
> upper-bound
> >> rather than a minimum.
= > >>
> >>
> >>
> >> The = optimum buffer state for throughput is 1-2 packets worth - in other
&g= t; >> words, if we have an MTU of 1500, 1500 - 3000 bytes. Only the b= ottleneck
> >> buffer (the input queue to the lowest speed li= nk along the path) should
> have
> >> this much actua= lly buffered. Buffering more than this increases
> end-to-end
= > >> latency beyond its optimal state. Increased end-to-end laten= cy reduces
> the
> >> effectiveness of control loops,= creating more congestion.
>
> This misses an important fa= cet of modern macs (wifi, wireless, cable, and gpon),
> which which= can aggregate 32k or more in packets.
>
> So the ideal si= ze in those cases is much larger than a MTU, and has additional
> f= actors governing the ideal - such as the probability of a packet loss induc= ing
> a retransmit....
>
> Ethernet, sure.
&g= t;
> >>
> >>
> >>
> >= > The rationale for having 50 ms. of buffering is probably to avoid
> >> disruption of bursty mixed flows where the bursts might pers= ist for 50
> ms.
> >> and then die. One reason for th= is is that source nodes run operating
> systems
> >> = that tend to release packets in bursts. That's a whole other discussion -> in
> >> an ideal world, source nodes would avoid bur= sty packet releases by
> letting
> >> the control by = the receiver window be "tight" timing-wise. That is, to
> >>= transmit a packet immediately at the instant an ACK arrives increasing
> the
> >> window. This would pace the flow - current O= S's tend (due to scheduling
> >> mismatches) to send bursts o= f packets, "catching up" on sending that
> could
> >>= have been spaced out and done earlier if the feedback from the
> r= eceiver's
> >> window advancing were heeded.
>
= > This loop has got ever tighter since linux 3.3, to where it's really a= s tight
> as a modern cpu scheduler can get it. (or so I keep think= ing -
> but successive improvements in linux tcp keep proving me wr= ong. :)
>
> I am really in awe of linux tcp these days. Re= cently I was benchmarking
> windows and macos. Windows only got 60%= of the throughput linux tcp
> did at gigE speeds, and osx had a lo= t of issues at 10mbit and below,
> stretch acks and holding the win= dow too high for the path)
>
> I keep hoping better ethern= et hardware will arrive that can mix flows
> even more.
> <= br />> >>
> >>
> >>
> >>= That is, endpoint network stacks (TCP implementations) can worsen
>= ; >> congestion by "dallying". The ideal end-to-end flows occupying = a
> congested
> >> router would have their packets pa= ced so that the packets end up being
> sent
> >> in t= he least bursty manner that an application can support. The effect
&g= t; of
> >> this pacing is to move the "backlog" for each flow= quickly into the
> source
> >> node for that flow, w= hich then provides back pressure on the application
> >> driv= ing the flow, which ultimately is necessary to stanch congestion.
>= ; The
> >> ideal congestion control mechanism slows the sende= r part of the
> application
> >> to a pace that can g= o through the network without contributing to
> buffering.
>= ; >
> >
> > Pacing is in Linux 3.12(?). How long = it will take to see widespread
> > deployment is another questio= n, and as for other operating systems, who
> > knows.
> = >
> > See: https://lwn.net/Articles/564978/
>
&= gt; Steinar drove some of this with persistence and results...
> > http://www.linux-support.com/cms/steinar-h-gunderson-paced-tcp-and= -the-fq-scheduler/
>
> >>
> >>
&g= t; >>
> >> Current network stacks (including Linux's) d= on't achieve that goal -
> their
> >> pushback on app= lication sources is minimal - instead they accumulate
> >> bu= ffering internal to the network implementation.
> >
> &g= t;
> > This is much, much less true than it once was. There hav= e been substantial
> > changes in the Linux TCP stack in the las= t year or two, to avoid generating
> > packets before necessary.= Again, how long it will take for people to deploy
> > this on = Linux (and implement on other OS's) is a question.
>
> The= data centers I'm in (linode, isc, google cloud) seem to be
> track= ing modern kernels pretty good...
>
> >>
> &= gt;> This contributes to end-to-end latency as well. But if you think a= bout
> >> it, this is almost as bad as switch-level bufferblo= at in terms of
> degrading
> >> user experience. The= reason I say "almost" is that there are tools,
> rarely
> = >> used in practice, that allow an application to specify that buffer= ing
> should
> >> not build up in the network stack (= in the kernel or wherever it is).
> But
> >> the def= ault is not to use those APIs, and to buffer way too much.
> >&g= t;
> >>
> >>
> >> Remember, the n= etwork send stack can act similarly to a congested switch
> >>= ; (it is a switch among all the user applications running on that node). > IF
> >> there is a heavy file transfer, the file tra= nsfer's buffering acts to
> >> increase latency for all other= networked communications on that machine.
> >>
> >= ;>
> >>
> >> Traditionally this problem has = been thought of only as a within-node
> >> fairness issue, bu= t in fact it has a big effect on the switches in
> between
>= ; >> source and destination due to the lack of dispersed pacing of th= e packets
> at
> >> the source - in other words, the = current design does nothing to stem the
> >> "burst groups" f= rom a single source mentioned above.
> >>
> >><= br />> >>
> >> So we do need the source nodes to imp= lement less "bursty" sending
> stacks.
> >> This is e= specially true for multiplexed source nodes, such as web
> servers<= br />> >> implementing thousands of flows.
> >>
> >>
> >>
> >> A combination of codel= -style switch-level buffer management and the
> stack
> >= ;> at the sender being implemented to spread packets in a particular TCP=
> flow
> >> out over time would improve things a lot= . To achieve best throughput,
> the
> >> optimal way= to spread packets out on an end-to-end basis is to update
> the> >> receive window (sending ACK) at the receive end as quickly= as possible,
> and
> >> to respond to the updated re= ceive window as quickly as possible when it
> >> increases.> >>
> >>
> >>
> >> = Just like the "bufferbloat" issue, the problem is caused by applications> >> like streaming video, file transfers and big web pages tha= t the
> application
> >> programmer sees as not havin= g a latency requirement within the flow, so
> the
> >>= ; application programmer does not have an incentive to control pacing.
> Thus
> >> the operating system has got to push back on= the applications' flow
> somehow,
> >> so that the f= low ends up paced once it enters the Internet itself. So
> >>= ; there's no real problem caused by large buffering in the network stack> at
> >> the endpoint, as long as the stack's delivery= to the Internet is paced
> by
> >> some mechanism, e= .g. tight management of receive window control on an
> >> end= -to-end basis.
> >>
> >>
> >>
> >> I don't think this can be fixed by cerowrt, so this is out = of place
> here.
> >> It's partially ameliorated by c= erowrt, if it aggressively drops packets
> from
> >> = flows that burst without pacing. fq_codel does this, if the buffer size
> it
> >> aims for is small - but the problem is that th= e OS stacks don't respond
> by
> >> pacing... they te= nd to respond by bursting, not because TCP doesn't
> provide
&= gt; >> the mechanisms for pacing, but because the OS stack doesn't tr= ansmit as
> soon
> >> as it is allowed to - thus buil= ding up a burst unnecessarily.
> >>
> >>
&= gt; >>
> >> Bursts on a flow are thus bad in general. = They make congestion happen
> >> when it need not.
> = >
> >
> > By far the biggest headache is what the = Web does to the network. It has
> > turned the web into a burst= generator.
> >
> > A typical web page may have 10 (o= r even more images). See the "connections
> > per page" plot in= the link below.
> >
> > A browser downloads the base= page, and then, over N connections, essentially
> > simultaneou= sly downloads those embedded objects. Many/most of them are
> >= small in size (4-10 packets). You never even get near slow start.
&g= t; >
> > So you get an IW amount of data/TCP connection, with= no pacing, and no
> > congestion avoidance. It is easy to obse= rve 50-100 packets (or more) back
> > to back at the bottleneck.=
> >
> > This is (in practice) the amount you have to= buffer today: that burst of
> > packets from a web page. Witho= ut flow queuing, you are screwed. With it,
> > it's annoying, b= ut can be tolerated.
> >
> >
> > I go over= this is detail in:
> >
> >
> http://gettys.w= ordpress.com/2013/07/10/low-latency-requires-smart-queuing-traditional-aqm-= is-not-enough/
> >
> > So far, I don't believe anyone= has tried pacing the IW burst of packets.
> > I'd certainly lik= e to see that, but pacing needs to be across TCP
> > connections= (host pairs) to be possibly effective to outwit the gaming the
> &= gt; web has done to the network.
> > = - Jim
> >
> >&g= t;
> >>
> >>
> >>
> >&= gt;
> >>
> >>
> >>
> >= > On Sunday, May 25, 2014 11:42am, "Mikael Abrahamsson"
> <sw= mike@swm.pp.se>
> >> said:
> >>
> &g= t;> > On Sun, 25 May 2014, Dane Medic wrote:
> >> ><= br />> >> > > Is it true that devices with less than 64 MB c= an't handle QOS?
> ->
> >> > >
> >= ;> > >
> https://lists.chambana.net/pipermail/commotion-de= v/2014-May/001816.html
> >> >
> >> > At g= ig speeds you need around 50ms worth of buffering. 1 gigabit/s
> = =3D
> >> > 125 megabyte/s meaning for 50ms you need 6.25 m= egabyte of buffer.
> >> >
> >> > I also d= on't see why performance and memory size would be relevant,
> I'd> >> > say forwarding performance has more to do with CPU s= peed than
> anything
> >> > else.
> >&g= t; >
> >> > --
> >> > Mikael Abrahamss= on email: swmike@swm.pp.se
> >> > ________________________= _______________________
> >> > Cerowrt-devel mailing list<= br />> >> > Cerowrt-devel@lists.bufferbloat.net
> >&= gt; > https://lists.bufferbloat.net/listinfo/cerowrt-devel
> >= ;> >
> >>
> >>
> >> _______= ________________________________________
> >> Cerowrt-devel m= ailing list
> >> Cerowrt-devel@lists.bufferbloat.net
>= ; >> https://lists.bufferbloat.net/listinfo/cerowrt-devel
> &= gt;>
> >
> >
> > ______________________= _________________________
> > Cerowrt-devel mailing list
&g= t; > Cerowrt-devel@lists.bufferbloat.net
> > https://lists.bu= fferbloat.net/listinfo/cerowrt-devel
> >
>
> >
> --
> Dave T=C3=A4ht
>
> NSFW:<= br />> https://w2.eff.org/Censorship/Internet_censorship_bills/russell_0= 296_indecent.article
>

=0A
------=_20140528113342000000_53151--