From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp111.iad.emailsrvr.com (smtp111.iad.emailsrvr.com [207.97.245.111]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by huchra.bufferbloat.net (Postfix) with ESMTPS id 65EEC21F0BA; Wed, 20 Jun 2012 12:03:42 -0700 (PDT) Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp41.relay.iad1a.emailsrvr.com (SMTP Server) with ESMTP id ABF04290056; Wed, 20 Jun 2012 15:03:40 -0400 (EDT) X-Virus-Scanned: OK Received: from legacy5.wa-web.iad1a (legacy5.wa-web.iad1a.rsapps.net [192.168.2.221]) by smtp41.relay.iad1a.emailsrvr.com (SMTP Server) with ESMTP id 6691D2904C7; Wed, 20 Jun 2012 15:03:40 -0400 (EDT) Received: from reed.com (localhost [127.0.0.1]) by legacy5.wa-web.iad1a (Postfix) with ESMTP id 200C22E9802E; Wed, 20 Jun 2012 15:03:40 -0400 (EDT) Received: by apps.rackspace.com (Authenticated sender: dpreed@reed.com, from: dpreed@reed.com) with HTTP; Wed, 20 Jun 2012 15:03:40 -0400 (EDT) Date: Wed, 20 Jun 2012 15:03:40 -0400 (EDT) From: dpreed@reed.com To: "Jim Gettys" MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_20120620150340000000_24780" Importance: Normal X-Priority: 3 (Normal) X-Type: html In-Reply-To: <4FE1F1C0.6030808@freedesktop.org> References: <3455F07C-D677-44CC-B8DD-54070D8CB2E6@gmail.com> <4FE1F1C0.6030808@freedesktop.org> Message-ID: <1340219020.138379@apps.rackspace.com> X-Mailer: webmail7.0 Cc: "codel@lists.bufferbloat.net" , "cerowrt-devel@lists.bufferbloat.net" Subject: Re: [Codel] [Cerowrt-devel] codel "oversteer" X-BeenThere: codel@lists.bufferbloat.net X-Mailman-Version: 2.1.13 Precedence: list List-Id: CoDel AQM discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Jun 2012 19:03:42 -0000 ------=_20120620150340000000_24780 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable =0ASimulate what you think is going on, or create a closed-form model. If= the phenomenon appears in the simulation, it will help you experiment with= how to eliminate it. If it does not, you need to understand why what you = *think* is going on is not what is actually going on.=0A =0AAs I noted, 70 = packet queues should not appear due to a simple overload. What TCP does, f= rom the 75,000 foot perspective, is try to aggressively move any queues tha= t would build up inside the network back to the source buffer, by managing = the window down whenever it sees a queue building.=0A =0AThat's why bufferb= loat is so evil - it masks any signal about the buildup of queues until all= the queues are full, and large queues take a *long* time to drain down to = "empty".=0A =0AThe steady state of a low-latency network under *any* load (= even overload) should be one where there are at most one packet queued on e= ach outgoing link internal to the network.=0A =0A[if you need to know why, = imagine the opposite were true - then the internal queues make all the cont= rol loops very, very long, which makes the network oscillate unstably, with= very large variance of latency.]=0A =0AThe purpose of queues is *only* to = smooth short random bursts, such as might happen on a shared internal link = due to occasional "collisions" of traffic from uncorrelated sources.=0A =0A= Unfortunately, a vast percentage of designers don't understand that. Hence= , we get bufferbloat - making the queues bigger and bigger, and eliminating= any queue buildup signalling back to the source that is overloading the ne= twork.=0A =0AI assume codel is supposed to fix that. If it is letting queu= es internal to the net fill up, it is doing the wrong thing.=0A =0A-----Ori= ginal Message-----=0AFrom: "Jim Gettys" =0ASent: Wednes= day, June 20, 2012 11:52am=0ATo: "Jonathan Morton" = =0ACc: "codel@lists.bufferbloat.net" , "cerowr= t-devel@lists.bufferbloat.net" =0ASubj= ect: Re: [Cerowrt-devel] [Codel] codel "oversteer"=0A=0A=0A=0AOn 06/20/2012= 06:08 AM, Jonathan Morton wrote:=0A> Is the cwnd also oscillating wildly o= r is it just an artefact of the visible part of the queue only being a frac= tion of the real queue?=0A>=0A> Are ACK packets being aggregated by wireles= s? That would be a good explanation for large bursts that flood the buffer,= if the rwnd opens a lot suddenly. This would also be an argument that 2*n = is too small for the ECN drop threshold. =0A=0AYeah, I've been worrying abo= ut ack compression... Not sure exactly what=0Awe should be doing about it,= as I don't fully understand it.=0A - Jim=0A=0A>=0A> The key to knowledge i= s not to rely on others to teach you it. =0A>=0A> On 20 Jun 2012, at 04:32,= Dave Taht wrote:=0A>=0A>> I've been forming a theory= regarding codel behavior in some=0A>> pathological conditions. For the sak= e of developing the theory I'm=0A>> going to return to the original car ana= logy published here, and add a=0A>> new one - "oversteer".=0A>>=0A>> Briefl= y:=0A>>=0A>> If the underlying interface device driver is overbuffered, whe= n the=0A>> packet backlog finally makes it into the qdisc layer, that burst= s up=0A>> rapidly and codel rapidly ramps up it's drop strategy, which corr= ects=0A>> the problem, but we are back in a state where we are, as in the c= ase=0A>> of an auto on ice, or a very loose connection to the steering whee= l,=0A>> "oversteering" because codel is actually not measuring the entire= =0A>> time-width of the queue and unable to control it well, even if it=0A>= > could.=0A>>=0A>> What I observe on wireless now with fq_codel under heavy= load is=0A>> oscillation in the qdisc layer between 0 length queue and 70 = or more=0A>> packets backlogged, a burst of drops when that happens, and fa= r more=0A>> drops than ecn marks that I expected (with the new (arbitrary)= drop=0A>> ecn packets if > 2 * target idea I was fiddling with illustratin= g the=0A>> point better, now). It's difficult to gain further direct insigh= t=0A>> without time and packet traces, and maybe exporting more data to=0A>= > userspace, but this kind of explains a report I got privately on x86=0A>>= (no ecn drop enabled), and the behavior of fq_codel on wireless on the=0A>= > present version of cerowrt.=0A>>=0A>> (I could always have inserted a bug= , too, if it wasn't for the private=0A>> report and having to get on a plan= e shortly I wouldn't be posting this=0A>> now)=0A>>=0A>> Further testing id= eas (others!) could try would be:=0A>>=0A>> Increase BQL's setting to over-= large values on a BQL enabled interface=0A>> and see what happens=0A>> Test= with an overbuffered ethernet interface in the first place=0A>> Improve th= e ns3 model to have an emulated network interface with=0A>> user-settable b= uffering=0A>>=0A>> Assuming I'm right and others can reproduce this, this i= mplies that=0A>> focusing much harder on BQL and overbuffering related issu= es on the=0A>> dozens? hundreds? of non-BQL enabled ethernet drivers is nee= ded at=0A>> this point. And we already know that much more hard work on fix= ing=0A>> wifi is needed.=0A>>=0A>> Despite this I'm generally pleased with = the fq_codel results over=0A>> wireless I'm currently getting from today's = build of cerowrt, and=0A>> certainly the BQL-enabled ethernet drivers I've = worked with (ar71xx,=0A>> e1000) don't display this behavior, neither does = soft rate limiting=0A>> using htb - instead achieving a steady state for th= e packet backlog,=0A>> accepting bursts, and otherwise being "nice".=0A>>= =0A>> -- =0A>> Dave T=C3=A4ht=0A>> SKYPE: davetaht=0A>> http://ronsravings.= blogspot.com/=0A>> _______________________________________________=0A>> Cod= el mailing list=0A>> Codel@lists.bufferbloat.net=0A>> https://lists.bufferb= loat.net/listinfo/codel=0A> _______________________________________________= =0A> Codel mailing list=0A> Codel@lists.bufferbloat.net=0A> https://lists.b= ufferbloat.net/listinfo/codel=0A=0A________________________________________= _______=0ACerowrt-devel mailing list=0ACerowrt-devel@lists.bufferbloat.net= =0Ahttps://lists.bufferbloat.net/listinfo/cerowrt-devel ------=_20120620150340000000_24780 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

Simulate w= hat you think is going on, or create a closed-form model.   If th= e phenomenon appears in the simulation, it will help you experiment with ho= w to eliminate it.  If it does not, you need to understand why what yo= u *think* is going on is not what is actually going on.

=0A

 

=0A

As I note= d, 70 packet queues should not appear due to a simple overload.  What = TCP does, from the 75,000 foot perspective, is try to aggressively move any= queues that would build up inside the network back to the source buffer, b= y managing the window down whenever it sees a queue building.

=0A

 

=0A

Tha= t's why bufferbloat is so evil - it masks any signal about the buildup of q= ueues until all the queues are full, and large queues take a *long* time to= drain down to "empty".

=0A

 

= =0A

The steady state of a low-latency netwo= rk under *any* load (even overload) should be one where there are at most o= ne packet queued on each outgoing link internal to the network.

=0A

 

=0A

[= if you need to know why, imagine the opposite were true - then the internal= queues make all the control loops very, very long, which makes the network= oscillate unstably, with very large variance of latency.]

=0A

 

=0A

The = purpose of queues is *only* to smooth short random bursts, such as might ha= ppen on a shared internal link due to occasional "collisions" of traffic fr= om uncorrelated sources.

=0A

 

= =0A

Unfortunately, a vast percentage of des= igners don't understand that.  Hence, we get bufferbloat - making the = queues bigger and bigger, and eliminating any queue buildup signalling back= to the source that is overloading the network.

=0A

 

=0A

I assume codel is= supposed to fix that.  If it is letting queues internal to the net fi= ll up, it is doing the wrong thing.

=0A

=  

=0A

-----Original Message-----From: "Jim Gettys" <jg@freedesktop.org>
Sent: Wednesday, June= 20, 2012 11:52am
To: "Jonathan Morton" <chromatix99@gmail.com><= br />Cc: "codel@lists.bufferbloat.net" <codel@lists.bufferbloat.net>,= "cerowrt-devel@lists.bufferbloat.net" <cerowrt-devel@lists.bufferbloat.= net>
Subject: Re: [Cerowrt-devel] [Codel] codel "oversteer"

=0A
=0A

On 06/20/2012 06:08 AM, Jonathan Morton wrote:
> Is the cwnd a= lso oscillating wildly or is it just an artefact of the visible part of the= queue only being a fraction of the real queue?
>
> Are ACK= packets being aggregated by wireless? That would be a good explanation for= large bursts that flood the buffer, if the rwnd opens a lot suddenly. This= would also be an argument that 2*n is too small for the ECN drop threshold= .

Yeah, I've been worrying about ack compression... Not sure e= xactly what
we should be doing about it, as I don't fully understand i= t.
- Jim

>
> The key to knowledge is not to rel= y on others to teach you it.
>
> On 20 Jun 2012, at 04:32,= Dave Taht <dave.taht@gmail.com> wrote:
>
>> I've = been forming a theory regarding codel behavior in some
>> pathol= ogical conditions. For the sake of developing the theory I'm
>> = going to return to the original car analogy published here, and add a
= >> new one - "oversteer".
>>
>> Briefly:
&= gt;>
>> If the underlying interface device driver is overbuff= ered, when the
>> packet backlog finally makes it into the qdisc= layer, that bursts up
>> rapidly and codel rapidly ramps up it'= s drop strategy, which corrects
>> the problem, but we are back = in a state where we are, as in the case
>> of an auto on ice, or= a very loose connection to the steering wheel,
>> "oversteering= " because codel is actually not measuring the entire
>> time-wid= th of the queue and unable to control it well, even if it
>> cou= ld.
>>
>> What I observe on wireless now with fq_code= l under heavy load is
>> oscillation in the qdisc layer between = 0 length queue and 70 or more
>> packets backlogged, a burst of = drops when that happens, and far more
>> drops than ecn marks th= at I expected (with the new (arbitrary) drop
>> ecn packets if = > 2 * target idea I was fiddling with illustrating the
>> poi= nt better, now). It's difficult to gain further direct insight
>>= ; without time and packet traces, and maybe exporting more data to
>= ;> userspace, but this kind of explains a report I got privately on x86<= br />>> (no ecn drop enabled), and the behavior of fq_codel on wirele= ss on the
>> present version of cerowrt.
>>
>= > (I could always have inserted a bug, too, if it wasn't for the private=
>> report and having to get on a plane shortly I wouldn't be po= sting this
>> now)
>>
>> Further testing i= deas (others!) could try would be:
>>
>> Increase BQL= 's setting to over-large values on a BQL enabled interface
>> an= d see what happens
>> Test with an overbuffered ethernet interfa= ce in the first place
>> Improve the ns3 model to have an emulat= ed network interface with
>> user-settable buffering
>&g= t;
>> Assuming I'm right and others can reproduce this, this imp= lies that
>> focusing much harder on BQL and overbuffering relat= ed issues on the
>> dozens? hundreds? of non-BQL enabled etherne= t drivers is needed at
>> this point. And we already know that m= uch more hard work on fixing
>> wifi is needed.
>>>> Despite this I'm generally pleased with the fq_codel results ov= er
>> wireless I'm currently getting from today's build of cerow= rt, and
>> certainly the BQL-enabled ethernet drivers I've worke= d with (ar71xx,
>> e1000) don't display this behavior, neither d= oes soft rate limiting
>> using htb - instead achieving a steady= state for the packet backlog,
>> accepting bursts, and otherwis= e being "nice".
>>
>> --
>> Dave T=C3=A4h= t
>> SKYPE: davetaht
>> http://ronsravings.blogspot.c= om/
>> _______________________________________________
>= > Codel mailing list
>> Codel@lists.bufferbloat.net
>= > https://lists.bufferbloat.net/listinfo/codel
> _______________= ________________________________
> Codel mailing list
> Cod= el@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/c= odel

_______________________________________________
Cerowr= t-devel mailing list
Cerowrt-devel@lists.bufferbloat.net
https://= lists.bufferbloat.net/listinfo/cerowrt-devel

=0A
------=_20120620150340000000_24780--